AI training material for French/German script records
My family is from the town of Rehlingen, Saarlouis, Saarland, Germany. The civil registration films in this town have been fully digitized, but not fully indexed. The church books in this town have been indexed in Latin. These records date from 1792 and extend into the 1870s.
Because the records are not indexed and because my family lines tie in to most of the families in the town I have been indexing these records myself into my own Excel sheet, image by image. The records begin in fully handwritten French, include French printed forms, then move into German script with printed forms, where the records can be mixed English and script characters.
I read the article in the Deseret News on March 18, 2023 that discussed how FamilySearch is using AI tools to teach the computer how to read Spanish, English, and Portuguese. In my work with AI, my idea had been to use a combination of a first deep convolutional neural network (CNN) taught how to write handwritten French to generate training samples for a second deep convolutional neural network that would then learn how to read handwritten French. The second CNN network model would then be used to construct the first pass index of the record. The same approach would be used to teach the computer how to write and read the various forms of German script written by different writers.
The challenge with me attempting to implement this with my indexes is the number of traning samples needed to train the CNN so it can successfully emulate a number of handwriting styles. So I have not yet tried this out.
However, I believe my current work that has fully indexed all the images in FHL film 1050646 (1005 records, marriages, births, and deaths) and my current work on FHL film 1050647 (550 records so far, marriages and births) may be useful for providing training data. Each of these records contains a number of extracted words and dates that all could be captured and used as training information. Since there isn't a formal indexing project for these films (and I think it would be difficult to use a traditional approach due to the diverse record formats) I wanted to offer my data as a resource if it would be useful to the AI effort.
I expect I am not the first person to do this kind of work on a town, but I would be happy to help with preparing the training samples if it would help accelerate the AI work in the French and German language space as German script is one of the more difficult things to learn to read as the characters in it are not the handwriting even German speakers today would learn in school.
Please feel free to reach out directly to me via email or phone if there is interest in reviewing my indexed data.
Comments
-
We appreciate your offer. This has been submitted to those working on AI indexing. Thanks so much.
0