"This record was indexed by a computer..."
Wow. New thing? Is it working out? Or full of errors?
Comments
-
Cindy Hecker said: They have been trying and testing out computer technology for some time. My verdict so far is people still do a better job. I found an obituary yesterday, clearly labeled indexed by a computer and it was from a partner, Genealogy bank. The names were almost all wrong. The mother's name was indexing correctly and in the indexed listed as the mother but the child who was an adult dying at age 22. Her name was Gloria Gaviati and the computer indexed it as Ollie Liowers. Way off! The family names of her husband, parents and married sister were in the obituary but were in weird places in the indexing and not all were spelled right. I was glad there was an image so I could read it myself. I am grateful it was indexed so it was found when I was searching on the mother but it was full of errors.1
-
Tom Huber said: Other than the Genealogy Bank obituaries, I do not know of any other material where a computer was used to index records. For the most part, the indexes are mostly accurate, but a lot depends upon the original image.0
-
m said: I can't remember if it was an obit or not.0
-
Juli said: The computer-indexed data is all GenealogyBank obituaries, which means that non-LDS like me cannot check the accuracy without forking over $$. I therefore do not know whether the incorrect relationships and misspelled names in obituaries of The Famous Relative are original to the newspapers, or created by the computer index.0
-
Tom Huber said: I have encountered some obituaries where there were a lot of errors, even in the newspaper copy. I’m not surprised that there are bad spellings, relationship errors and so on.0
-
m said: It might have been the obit of a person who died in 1800s something attached to a person who was born in the 1600s that I detached without reading it since it was impossible.0
-
Adrian Bruce said: Both Ancestry and FindMyPast, as well as other sites like TROVE, use OCR to index sources like directories, electoral registers and newspapers - some of those indexes may get through into FS.
In essence there are two areas of problems as I see it.
1. Legibility of individual words - the classic OCR problems;
2. A failure to parse the block of OCR'd text correctly into the right items of given names, family names, addresses, etc, etc. I'm sure we've all seen things like an address of 32 John Street being interpreted as a person of that name. This is easy enough where the format is fixed - less so if, like electoral registers, the format constantly changes.0 -
Juli said: All of the online repository sites do some form of OCR-based search for printed materials. The difference between those and the "indexed by computer" obituaries is that the latter attempts to actually interpret the words to find the names and the relationships between them.0
-
Adrian Bruce said: That's interesting, Juli, thanks. I know that FindMyPast have attempted a degree of interpretation / parsing on the UK electoral registers - their first OCR attempt was quite crude in interpretation and suffered from taking a given name off one line and surname from the next and similar issues. They have a second collection on the same images with rather better interpretation / parsing, though I don't know if it's complete yet or how successful it is overall - our registers do change format a lot.0
-
m said: if it is OCR, I don't think it was handwritten text, then. so it was probably that one obit i detached for being 200 years out of place.0
-
A year later.... it is no longer confined to OCR and indexing of obituaries. FS is now trying it on other records. I am currently looking at church records from the 1800s in the Philippines "indexed by computer". EVERY ONE of them has at least one transcription error. They also miss names that are spread across two lines (which happens regularly). It regularly fails to correctly identify the relationship of the principal person to other people in the record. It also indexes the names of public officials and priests, which usually have little genealogical value, which then just end up cluttering the index and source links.
Furthermore, if you attach one of these records as a source, the source detail does NOT show: (1) event type, (2) event date, (3) event place, even though that data has been captured. You have to review the attachment in the source linker and open the primary person's detail in order to see this information.
In my opinion, indexes done solely by computer, especially for handwritten text, are not yet ready to released directly for use. Better to have the computer do a first pass in the two-step indexing process, and then have a human review it.
1 -
My guess is that is the index editing tool is meant to not just correct individual records but also:
- train the handwriting OCR system
- detect the most badly transcribed batches and flag them for re-processing
There are simply too many historical records waiting for processing, to do them all entirely by hand. And the backlog must be growing rapidly.
0 -
Machine intelligence and OCR is always a moving target
but one should consider the fact that the OCR technology that was used to transcribe the specific newspaper article could have occurred 20 years ago (originally) and has never changed since then (the transcription) . . . .
and comparing that technology to what is available today can be like night and day.
its not always easy to judge when something that we see today that turned out bad -
may have been the result of something that happened 20 years ago - or maybe more recent but with technology that was 20 year old technology.
so its not always easy to judge todays technology on what you see in a specific item today - when you don't know when and how it took place.
0