Old Idea: CAPTCHA-like Indexing/Reviewing?
In the past I have thought (and others - apparently that old link has been archived/deleted) it would be interesting if FamilySearch implemented this type of Indexing/reviewing.
Basic idea:
Require or allow users to set a requirement to index X # of records at FamilySearch login. Requiring data-entry of 1 field of a record doesn't take too much time from the user.
Benefit:
Users give back to the community by Indexing records - Indexing is a voluntary activity - but what if it were a requirement - for example, at FamilySearch login to index at least 1 field of a record? OR perhaps the user likes the idea of giving back and so would like to have a setting that requires indexing of X # of records before they use the Tree or Search Records. Plus some users probably don't have the time to dedicate to Indexing a batch - even though they have a default 7 days or whatever the project default is to complete it. As it currently stands unless a Tree user is dedicated to getting an indexing batch done - they may not even wander over to Web Indexing. This idea would allow them to do Indexing even though their primary focus is on researching the Tree/Records.
Disadvantage:
Some users probably wouldn't like this requirement - that's why I am suggesting a setting that allows them to chose if/when this 'requirement' is enabled.
Comments
-
There are other activities that could be done that are similar to what you are suggesting. I like to do Reverse Indexing when I want a diversion. https://familytech.byu.edu/apps/reverse-indexing.html
0 -
@Melissa S Himes This suggestion would not be a diversion - it would be a setting making it a requirement or not.
0 -
I guess I am confused by the idea to only index 1 field of a record.
0 -
@Melissa S Himes If you are familiar with CAPTCHA you know that it requires the computer user to input a response to an image (usually - though I have noticed some sites appear to not have it implemented to require correctly?). In this case - the idea would be similar - require the user to input a transcription for an Indexing image - but only for one or X # of fields.
Just as you mention previously/elsewhere - any Indexer can Return Batch at any time - so indexing 1 field shouldn't be any different - as long as the image database could handle the file locking seamlessly.
It would just be an interesting way of having people participate in Indexing without going to the Indexing>Web Indexing app. Can you imagine how much more indexing would occur if all FamilySearch users were 'required' to index 1 field when they login? Mainly for people that don't have dedicated time to complete a full record/batch - it would allow them to contribute toward indexing. It would be especially applicable on the mobile app because 1 field would fit on a mobile device better than an entire record image. Just as Twitter shortens communications it would be shortened Indexing 😀
By all means though - if a complete record is a better snippet for the user to input an indexing response to rather than just 1 field that would be great or if they were presented with a reverse indexing panel to 'match' that would be great too. I was just thinking 1 field is definitely short enough to have the user not get too bothered by it when logging in - and plus if they wouldn't like that - they wouldn't have to enable the setting.
Repurposed story I've heard before about these types of questions:
A boy was walking along a web indexing page and saw a lot of indexing fields laying on the indexing form for that page. He transcribed a field and pressed Return Batch. Another indexer came along and said, "why would you index one field and Return Batch? You'll never finish a batch if you index one field." The boy thought about it and replied, "I might not get a bunch of batches done, but I can index a field a day and Return Batch and that helps get the batch done."
0 -
Looks like BYU Family History Lab: Indexing Go app has taken this idea under initial development. Apparently it has some merit 😀 though no 'Likes' 🙄
0 -
FYI
It's 'Brett'.
Just in passing ...
Although, I understand, why you made this suggestion ...
I am sorry ...
I am 'vehemently' opposed, to such ...
I DO NOT want, ANY User/Patron, DOING "Indexing"; BECAUSE, they are FORCED to.
I WANT, Users/Patrons, DOING "Indexing"; BECAUSE, they are WANT to.
Forcing, Users/Patrons, into DOING "Indexing", WILL only PROMOTE, "Poor" QUALITY "Indexing".
It is BAD enough, with regards to the QUAILITY of "Indexing", with Wards/Branches/Stakes/District PROMOTING, the likes, of an "Indexing", COMPETITIONS; or, CHALLENGES, let alone, FORCING Users/Patrons. who DO NOT want to do it, to HAVE to do it.
And ...
There should be NO requirement, to "OPT" either, "IN"; and/or, "OUT".
As, there, is; and, there should be, NO requirement to do "Indexing" ... period.
Users/Patrons, ALREADY 'pay back', by participating in the "Family Tree" Part, of 'FamilySearch'.
[ And, for Member of the Church, DOING "Temple" Work (or, JUST "Sharing with the Temple System" ... ]
And, for Users/Patrons, who do not participate in the "Family Tree" Part, of 'FamilySearch', that is just fine.
[ ie. those, who ONLY use 'FamilySearch', for the "Records", therein ... ]
I know of, MANY; Many; many, Members of the Church, who DO NOT "Index".
[ As, they ALREADY have enough to do, with their, Family; Work; &, 'Callings' ... MANY hold MULTIPLE 'Callings' ]
[ Let alone, ALSO doing their OWN "Temple and Family History" Work ... ]
There should be NO mechanism, whatsoever, in place, in 'FamilySearch', to FORCE Users/Patrons, to "Index".
Again ...
I am sorry ...
I am 'vehemently' opposed, to ANY mechanism, in 'FamilySearch', that would FORCE Users/Patrons, to 'Index'.
[ That is just ... NOT ON ... ]
Just my thoughts.
Brett
1 -
Interesting ... Ho hum ... Thanks for the verbosity (as always)... I don't know anyone that indexes because they are 'forced to'. The quality of indexing has more to do with human error - or AI error if you like that indexing better - than any supposed compulsion. I understand the Utah State Prison has been one of the most productive indexing groups in the world - I don't think any inmate there is forced to participate. This idea wouldn't compel any FamilySearch user to index.
Cool - something else you oppose me on. 'To each their own' ... Be offended, vehemently opposed ... or not ... I can't help your opinion or inability to see the idea on its merits. Perhaps let BYU FH Lab know of your vehement opposition. I on the other hand like this Idea. But it certainly is not a 'big enough deal' to be offended at even IF one field entry WAS 'required' at login (but sure be offended at the usage of words Familysearch 'guest' ...).
It's an idea with merit - a way for those with little time to participate in indexing - should they choose to participate in magnifying the crowd-sourced opportunity. Also perhaps an unobtrusive way to resolve some 'index issues'.
G'day
0 -
FYI
Well ...
That appears to be, contrary, to your original Idea ...
Quote:
------------------
Basic idea:
Require or allow users to set a requirement to index X # of records at FamilySearch login. Requiring data-entry of 1 field of a record doesn't take too much time from the user.
Benefit:
Users give back to the community by Indexing records - Indexing is a voluntary activity - but what if it were a requirement - for example, at FamilySearch login to index at least 1 field of a record? OR perhaps the user likes the idea of giving back and so would like to have a setting that requires indexing of X # of records before they use the Tree or Search Records. Plus some users probably don't have the time to dedicate to Indexing a batch - even though they have a default 7 days or whatever the project default is to complete it. As it currently stands unless a Tree user is dedicated to getting an indexing batch done - they may not even wander over to Web Indexing. This idea would allow them to do Indexing even though their primary focus is on researching the Tree/Records.
------------------
That seems to imply a REQUIREMENT ...
As such ...
I am NOW totally confused ...
Brett
0 -
Yep ... Read as you will ... Be confused as you will ... Read the rest with an open mind. The 'basic idea' - 'allow users to set' - implies 'a setting' (i.e. an option like all the other Account Settings options). A toggle setting is off or on - again an option
...
My question: why would any 'guest' be offended at a host 'requirement' to - wash their hands before dinner - or take off their shoes upon entering their home? To me I do look at FamilySearch's graciousness in storing, imaging, contracting, publishing, 'hosting huge amounts of data' over many years - as a 'host' welcoming a guest to use said records/resources. I take no offense at the word 'guest' nor would I take offense IF this Idea were a requirement. Instead the idea expressly allows the user to choose ... So yes - your whole argument is befuddling/confusing/misrepresentational.
0 -
FYI
I did read, your "Idea", with 'an open mind' ...
[ Hence, why, I started of by saying ... I understand, why you made this suggestion ... ]
And ...
As, I suggested ...
I still maintain ...
There should be NO requirement [ or, need ], to "OPT" either, "IN"; and/or, "OUT".
As, there, is; and, there should be, NO requirement to do "Indexing" ... period.
Brett
ps: Many Users/Patrons, find the "System", difficult enough to use; as, it is, such would be placing ANOTHER "Level" of COMPLEXITY; and BURDEN, that MOST Users/patrons, DO NOT, want; nor, need - ESPECIALLY, for the "Newbies" and "Inexperienced" (or, even, the "Occasional" User/Patron).
pps: I am NOT having a [ Personal ] go at you ...
[ NOR, did (or, do) I, make any Personal 'Sight', upon your 'Style' ... Those references are not warranted ... ]
.
0 -
Oh but you are ... You are
belittlingintentionally misrepresenting essential facts of an IDEA (that was typed in black and 'vanilla') ... Of course from your opinion/view point - but you aren't being 'constructive'. I think anyone who reads the complete thread with an open mind - would know that the idea doesn't advocate force/compulsion as you intend/misrepresent it. If one opts-out or doesn't enable - is that 'forcing' someone?"Disadvantage:
Some users probably wouldn't like this requirement - that's why I am suggesting a setting that allows them to chose if/when this 'requirement' is enabled."
"This suggestion would not be a diversion - it would be a setting making it a requirement or not."
You conveniently left that paragraph and initial response to Melissa out of your critique that you quoted earlier ... why? well I guess so you could 'vehemently oppose' the idea?
So field entry of one word (i.e. name, place, date ) is too 'complex' a burden (for anyone interested in family history)??
I'm just defending the idea with logic and reference to other conversations where you promote your ideas/opinions with 'vehement verbosity/offense' ... We have had other threads where we tend to disagree about this or that. Your style is yours and mine is mine ... We have agreed to disagree before ...
Hey it's no skin off my nose - I will put my ideas out there - anyone can agree or disagree or ignore them ... or any combination thereof. My post today - being a constructive post - provided a link to BYU FH Lab: Indexing GO app - which is an implementation of this idea.
0 -
FYI
'No' ...
I am NOT ... belittling an IDEA ...
I am giving MY "Reasons", for OPPOSING, such an "Idea" ... that is NOT, "Belittling" ...
MANY of my "Ideas", are opposed to, by others; but, I DO NOT, take offence at such ...
[ NOR, do I make a person ATTACK, upon their 'Style ... ]
And, 'No', I did not conveniently, leave that part out.
The IMPORTANT parts, were in, those two paragraphs.
Said my piece ...
I am done ... I am out ...
Good Luck.
Brett
0 -
Hmmm ... 'Vehement opposition'... Not belittling? be·lit·tle verb
- make (someone or something) seem unimportant. "this is not to belittle his role" Similar: disparage denigrate
Ok maybe not completely the most precise word to use ... But vehement opposition while misrepresenting essential facts is certainly not 'constructive'.
Why? Reasons: 'Vehement opposition'... Red herring ...
"The important parts, were in, those two paragraphs." You mean important in order to argue against a 'forced' setting that is optional??? ... Red herring ... And not constructive ... And for what purpose/motivation ... Consider the initial post from today - why this response ... ??
Interesting ... G'day
0 -
I must admit that I actually agree with Brett (if I'm skimming his responses correctly): making anything mandatory is a Supremely Awful Idea. Nobody likes doing anything -- no matter how easy or even entertaining -- if it's required.
The part that doesn't make sense to me is the idea that people can be expected to look at a handwritten page and decipher just one thing from it. That's not how paleography works. It often takes me several pages before I can make sense of the handwriting, and I sometimes find myself paging through several years of records looking for more examples of a particularly difficult name. This is one of the things I don't like about FS's indexing setup: you get your page or three for indexing plus maybe two more, and that's it. There's no guarantee that the next batch in the same project will be from the same location, so anything you figure out about the names and the clerk's handwriting is likely to be wasted. It sounds to me like this captcha idea magnifies this problem, making basically everything into mostly-wasted effort with almost nothing to show for it.
I suppose it's different if what you're indexing is typewritten or printed. I have no actual experience with that.
3 -
That's fine - but maybe read more thoroughly and you should see this would be an optional self-enrolled setting - therefore not a requirement unless the user set that. Is FamilySearch login required? How much additional burden would entering a field (Name, Date, Place) add? If someone can't enter 1 field - because they are already 'time burdened' - why are they logging in?
Thanks for your constructive assessment though. You are right sometimes a segment might be difficult to decipher alone - perhaps the implementation will be able to AI differentiate 'well-formed letters' and only add those to the queue? But certainly indexing doesn't require a paleography degree - nor a complete document - just general familiarity with different writing. I don't know how they are implementing segmentation - but would assume some AI involvement? In addition - in my experience - if the batch includes reference images - there are usually 3-5 before and 3-5 after the image you're indexing. Even then I don't know that one can tell if they are in the same order as the physical book/manuscript were arranged. I would assume so - thus calling them 'reference images' - but sometimes it seems they might be rearranged in order.
The main rebuttal to your criticism is: Entry of 1 (or a few) indexing fields - especially magnified by crowd-sourced adoption of this sort of implementation - could significantly increase indexing output. I cannot help your seeing the idea as 'wasted effort' (now that actually might be belittling the idea). But yes - not having reference/document context could affect quality of indexing - it might be interesting to attempt to quantify that ...
0 -
I wouldn't be surprised if this is another app to help train AI - just like reverse indexing. I have tried a few of the Indexing Go words and determined that you can't read alot of the words without seeing the context and the scribe's handwriting. That might be the next BYU Tech Lab conclusion after the experiment. Time will tell.
1 -
The few I saw were 'easily decipherable'.
I would not doubt it is part of AI routine - because segmentation is required before character/word recognition. Yes, it will be interesting to see what conclusions are reached.
0 -
On question-and-answer forums, whenever people post a snippet containing just the bit they can't decipher, the first and immediate response from all of the experienced paleographers is invariably "we need more context". It seems to me that trying to "teach" a computer to read handwriting in snippets makes the task much more difficult than it should be. Reading handwriting involves knowing already what it might say, but a single snippet is woefully inadequate for coming up with the necessary "candidate pool".
For example, here are four snippets for your consideration:
Without any more context than this, do you have any idea what they say? Hint: they're all the same spelling of the same name. Second hint: it's neither "Salem" nor "Solom" (although one of them was indexed as each of those).
The answer is Selem, which is an old variant spelling of the Hungarian surname Selyem "silk" (short for a silk-weaver or -merchant).
1 -
Interesting ...
The first (top left) is more clearly Selem - the second (top right) is almost as clear. The bottom two (especially the left one) are not as clear - and I could see variation occurring in the index.
Good point. Yes - the language of the record would be a good thing to know. And yes - that generally is known - but perhaps AI would need more than a snippet to figure that out. I guess it depends on how much of a snippet - but a human indexer would probably not have too much problem with these snippets if known that it was from a record in their native Hungarian language. I don't know if Salem, Selam, Salam or Solom have any common meaning/familiarity in Hungarian. So yes - batch/snippet identification of the record (collection title or at least language) should be included - that text probably wouldn't be too difficult to display above/below the snippet. Since these 4 examples are 'old variant spelling of surname' - perhaps the snippet could also be identified (AI I suppose) as Name perhaps even Surname and of course the collection would probably be dated with a certain Date Range (which does give the native language indexer more context from which to decipher). Most of this could be parsed from values entered in pre-indexing preparation of the collection and again text attached to the snippet. From the examples at the Indexing Go app it seemed most I was presented with were of category - occupation - interesting use of the idea.
I can't give you an algorithm for how to decipher letters/words - even in your small 'test the idea' case. But with this particular example - AI would probably not get them each transcribed correctly - nor would most humans (I suspect). But what would you rather have in the index: Selem or S?l?m or as you indicate the more modern Selyem - for each of these 4? Or would you rather have both the old variant(s) and the modern spelling? Again you have ignored the possibility of AI separating out snippets with 'well-formed' letters - perhaps it would not even queue the bottom 2 examples - or if it did would mainly be looking for suggestions on the vowels? What are the first/given names not displayed for these snippets? Would the average native Hungarian reader understand they were Surname if the snippet included Given Name? If so - then for Hungarian records make sure Name field snippets include full name. I suppose that would be enough context to help an average native Hungarian to decipher a majority of Hungarian names (especially if the average Hungarian writes in Hungarian)?
0 -
One of the many complicating factors is that these snippets are from Catholic registers from the 1700s (from a town that's now in Slovakia). This means that they are written in Latin, and this affects the given names. The person with this surname was recorded as Ignatius. In pre-WWI Hungarian spelling, that would be Ignácz; the fully-modern spelling is Ignác. I'm a name hobbyist through and through, so I know these sorts of equivalences without even needing to think about them, but your average Hungarian indexer will not necessarily know that Emericus = Imre or Ludovicus = Lajos, and will not always decipher a Ladislaus (= László) or Valentinus (= Bálint) correctly.
The top right snippet above came from this page: https://www.familysearch.org/ark:/61903/3:1:9Q97-Y39N-7S5?i=46&cc=1554443. It's the second-from-last entry on the page, just below "September!". (I don't know who added all the exclamation marks to this register and why.) The child's given name was indexed as "Priscilia", but I'm pretty sure it actually says Rosalia. This is on the principle that when you hear hoofbeats in Texas, you should think horses, not zebras -- but you need to have an idea of the sorts of names that a community actually used to tell the horses from the zebras. No image snippet is ever going to give you this kind of context.
Both these Hungarian records and most American records have a mix of surnames of different linguistic origins, with a variety of spelling conventions applied. For example, Rosalia Selem's godmother is recorded as "Dna. Elis~ Leszkovszky" (as near as I can tell, anyway). That's an abbreviated Latin honorific (domina "lady"), an abbreviated Latin given name (Elisabetha = Erzsébet), and a Slavic (most likely Slovak) surname spelled according to Hungarian orthography (sz for /s/ as in Sam). What this means for your captcha idea is that conveying the context needed for handwriting interpretation is highly non-trivial: you can't just say that the language is Hungarian, because (a) it technically isn't, it's Latin, and (b) not all of the surnames are Hungarian. This is just like labeling the language of Ellis Island arrival manifests: yes, the occupations and ethnicities and countries are in English, but that doesn't help with the names of people and places.
In the interest of not letting the best be the enemy of the good, I can concede that having Selem indexed as Salem or whatever else is actually fine. As I tell people all the time, the index served its purpose: I found the records. (Or more exactly, Family Tree's "possible duplicates" algorithm found them.) But reducing the page down to snippets requires extra work, and makes indexing more difficult, so I really don't see the point.
1 -
Well ... perhaps your suggestions will remove Hungarian/Slovak Catholic registers from 1700s from consideration (if the idea 'flies')? I'm fine with all collections/records not being in good alignment with segmented/snippet indexing method. Perhaps it will be a niche solution for resolving 'certain indexing issues'. Perhaps require these advanced issues to be indexed by persons with more pertinent knowledge (such as yourself). But yes - if 3 of 5 characters can be determined with relative accuracy - then perhaps just ? the others (it will still be found in searches). Perhaps add a Hungarian Latinized name equivalency wiki page? I tend to believe average indexers can do a good job - especially if they are taught. Maybe just English/American records will be considered?
I cannot help your 'not seeing the point'. I see it ... 7 +/- days vs. 1-2 mins? But yes you are right - does the work/benefit outweigh the cost/effort? I don't know the answer to that ... What's time/effort worth? Why shoot down an idea without trying? Obviously someone thinks there's value in attempting/experimenting ... Hopefully they get good results. At least they can attempt quantifying whether snippets lead to a significant difference in indexed results (who knows perhaps they'll even test some Hungarian records)?
0 -
My surname is spelled many different ways.. My immediate family use the English spelling and our French relatives use the French spellings of which there many different variations . I find the same for ancestors who migrated from Europe- surnames were shortened in some cases . So it would be a problem if one spelling was chosen for all variations of a surname "dit" surnames were also used as part of French surnames ..
0 -
Interesting - thanks for the 'Status Change' to 'Under Consideration' - this is the first I have seen. It also locked all previous comments - just checking if it prevents further comment (nope comment away). New notification of 4 hours to edit comments. Previously edited and now blank comments can be deleted - but that privilege (to delete one's own comments) has been removed. This makes communication more difficult (I frequently edit posts).
1 -
That's pretty cool! The rest of the changes are not. Hopefully they will work out whatever bugs are entering into their program...
But, you can edit the posts even though it has that (Edit 4 hours) message. I just edited this one.
0