User Initiated AI Guided Indexing Of Collections Of Interest
I was eating lunch and had this idea and wanted to share it to see if anyone else thought it would be feasible or even a good idea.
So when either myself or my colleagues do research, we often have to do a page by page search of the entire volume as there is simply no index whatsoever for that volume, or we are looking for persons that are outside the scope of the typical name index already in existence for a given volume (grantee/grantor, deceased person who's estate is being probated, etc).
My idea is that each user can either do one of the following for each collection in the FamilySearch catalog that has not been indexed yet: 1) Submit a request and get an estimate of when a formal AI guided indexing project will get to the record set desired [If one learns that the collection is expected to be indexed within a desired time frame (weeks or months), one may elect to delay searching a given collection for a time and pick other research tasks to complete first] 2) open up an AI guided indexing program session that allows a given user to start indexing a given volume for each name listed therein. For collections that have been subject to user-initiated indexing efforts, they can be designated in the catalog as partially indexed, along with a report that displayed on mouse click about what images were indexed so another user can get a sense of how much of the volume was reviewed by prior researchers.
Overall, I think it would be a great idea to have a user feedback option in helping the FamilySearch staff determine what collections are in the most demand by the research community and give users the tools to help index collections that might not be on the top of the priority list, especially in non-North American countries. Obviously, at some point, formal official review of a given collection indexed by user-initiated indexing efforts should occur, but every volume treated by persons with special interests in the said collection would be one less volume the general indexing program would have to deal with getting indexed- it would be a matter of quality assurance.
For example, if one has to review a court minute book page by page, wouldn't it be nice to help the next researcher not have to read through each page line by line like you had to? I have had this thought on several occasions when planning a page-by-page search of a collection I wish I had an index for beforehand, but how would I share my efforts so it helps the most people possible? Suppose I have to make a spreadsheet and put it on a personal website. In that case, there is a good chance that unless someone checks out the research wiki for the jurisdiction of which I created an index for a given recordset for, to see if anyone has mentioned a resource made by a third party, there is little chance of someone else being aware of my work, let alone use it. Such privately initiated efforts would involve a great deal of proactivity and take up a great deal of time and effort on anyone so inclined to be made available to others interested in searching a given collection for relevant records. These private efforts might be redundant in coming years, with larger-scale official projects going over the collection unaware of any personal research efforts to create an index for a given record set for a given jurisdiction or institution. The internet is a wild place where private sites can disappear for several reasons. The most frequent is that a personal website may have its domain expire due to the owner not being able to renew or maintain their website with the hosting authority or domain register... Many small private genealogically oriented websites have disappeared with valuable gems of research aids, some of which were not crawled and captured by the Internet Archive because the original creator died or became incapacitated and nobody bothered to keep their website running. Wouldn't it be nice to help create an index for a lower priority record collection the main indexing program might not ever get to, and know at the end of the day, one's efforts are not going to disappear forever after as they become incapacitated or die?
User-initiated Special Interest AI guided Indexing. I understand that the AI system might not be able to handle all the user-initiated indexing sessions at once, so perhaps having the ability to even reserve a time slot for when one could search and create a name index for a given collection would make it practical to maintain within the existing infrastructure FamilySearch has. Providing tools to enable the advanced users to help everyone else when involving them in doing something immediately relevant would be highly beneficial- advancing personal interests with voluntary efforts that can bless and help others with interests in the same collections.
On a scale of 1-10, how likely would the FamilySearch development team implement or even consider developing such a system?
Maybe nobody else will care about this idea or set of ideas of mine. Still, at the very least, I feel it my duty to share them for other parties who can decide if anything can or should be done with them can have even the remotest chance of a possibility of hearing about them.
Comments
-
The idea of users being able to request that a film be prioritized for AI-assisted indexing is one that I support. However, because the permission of the record owner/custodian is needed to index their records, it will not be possible to immediately start indexing. It is also time-consuming and inefficient for FamilySearch to have seek permission for lots of small collections. From their perspective, it is best to start with the record custodians with the most records (e.g. NARA and TNA, the US and UK national archives) and then move on to record custodians with smaller collections. For the very smallest collections, it may never be economical/affordable for FS to seek permission unless there was very strong interest from users- which your proposed feature would help identify.
But despite the challenges, a similiar thing was done during the digitization process: Users were able to contact FamilySearch and request that films be digitized. FamilySearch then digitized the record and checked if they permission to publish. If FS had permission to publish it, it was typically available a few weeks after the request.
0 -
Great to see another idea (even one at lunch) person out there ... I like your idea.
Along the lines of your idea (yeah another idea - sorry if it is not the type of feedback you were looking for - i'll have to revist this later).
It would be great to have a user-contributed transcript - or at least ability to comment (if desired) on Collections/volumes which particular researchers may value/repeated access. So essentially (as long as the collection/volume grants that kind of permission ... so there might need to be that tracking capability) - a digital version could be crowd-sourced and made available in some format - via FamilySearch would be a great place in my opinion... think open-crowd-sourced collections from record custodian permissions. So if you were searching a currently unindexed collection you could add an index/transcription - at least for the people you find - pre-official-indexed submissions...
I don't know how much researchers would adopt this approach - but hey I was on lunch too ...
0 -
I don't work in the right part of the world to have encountered this "AI-assist" that you mention, but I've often thought as I was wading my way through images: what if I could read the names I see aloud and have them transcribed that way? It wouldn't work at all in English, because it doesn't have spelling rules, but in Hungarian or German, we could create a semi-decent finding aid with minimal extra attention.
1