Request to do a batch-correction of dates in a collection ๐
Hi, there is a collection of data for Colombia,
"Colombia, DAS Civil Registration Alphabetic Cards, 1914-2011", Database. FamilySearch. https://familysearch.org : 6 December 2024.
https://www.familysearch.org/en/search/collection/5000074
About 20% of its records were wrongly assigned dates in the 19th instead of the 20th century (i.e. 1858 instead of 1958).
A simple tabulation of dates of birth shows a suspicious bulge of births from the 1840-1869 that makes no sense.
Source of numbers:
Please take a look: it would be very easy to batch correct those dates on your backend. Users can't edit the transcription from that collection.
Cheers,
-J
Answers
-
I know nothing about Colombian records but since no-one else has responded, I shall try to help. Firstly, I know that the usual response to queries like yours, @JuanZuluaga3, is to ask for the URL of some source records in that collection with issues.
Secondly, if you are seeing 1858 when it should be 1958, then that sounds like an issue with the date standardisation routines. I emphasise "sounds like" - it could be something entirely different, which is why sample URLs are crucial to find out what's actually going on. If it really is an issue with the date standardisation routines, then, if I understand correctly, FS cannot do a correction in their backend database because the fault lies in the code that generates the 1858 (say), not with any fixed data. The code doing the generation appears to be called at various times, so correcting the back-end data will only work until the next time the standardisation runs.
0 -
From the Wiki article on this collection:
Computer Aided Indexing (CAI)
Some or all of the indexed records for this collection were created using Computer Aided Indexing (CAI) from records digitized by FamilySearch or its partners. Each record indexed using CAI will include the message, "This record was indexed by a computer" with a link to report errors. For more information on CAI see FamilySearch Computer Aided Indexing.
2 -
@รine Nรญ Donnghaile - ahhh, thank you. So another potential failure mode would be OCR being confused between an 8 and a 9โฆ So it further emphasises that @JuanZuluaga3 needs to supply some URLs of records that have gone wrong so they can be investigated.
1 -
The URL to the search results is in the OP and shows results with dates in the 1800s:
I'm questioning the title of the record set. On the Wiki page, the dates are 1937-1990, while on the individual records, the dates are 1914-2011.
I know some of the newer collections have dynamic titles; the title may change as the algorithm interprets the collection.
1 -
@รine Nรญ Donnghaile - sorry, I need to justify or clarify my request a bit more, I think. I don't think that URL in the original post is enough, because, unless I'm missing something, we don't know which, if any, of those 1800s dates is wrong. The engineers will, I assume, be looking for specific records with an explanation of what the error is, and why it's known to be an error, eg the guy would be 150 years old if the date is in the 1800s rather than the 1900s.
Without those examples and explanations, I doubt the engineers can get going.
An unexplained bulge in the numbers in some period is a good quality check but it doesn't help to identify what the issue is.
I do confess that I'd missed the oddity of the years in the collection title but that's probably for 2 possible reasons - firstly, I don't understand what "registration" means in this context (specifically, how do birth dates relate to the data?) and secondly because archivists and I appear to frequently disagree over what the years in a collection title might signify, so I tend to ignore the years listed.
1 -
Good morning @Adrian Bruce1
Perhaps I should clarify my thoughts on the dates. Whether the collection covers 1914-2011 or 1937-1990, I find it odd that many background checks were performed in the 1800s.
With an index-only collection, we're in the dark. The collection is new; we may see images later.0 -
@รine Nรญ Donnghaile - agreed. It's all a little bit odd based on what we see and what we are told... And that's before we even think about the various possibilities for errors that we agree exist...
1 -
I've put it on my watch list. Perhaps it will show up as available at an FSC/AL and I can see what is actually there.
1 -
@JuanZuluaga3 Dates being off by 100 years is a common error stemming from the year on the document being written with only two digits and especially when AI indexed. The code chooses the wrong value from a list of possible interpreted dates. It can happen during standardization or in a system migration. Just like the place name errors, we are tracking affected collections. Some may require a targeted fix as you described, but usually when I report these I am told the focus is on bulk corrections.
1 -
@Adrian Bruce1 said "we don't know which, if any, of those 1800s dates is wrong. The engineers will, I assume, be looking for specific records with an explanation of what the error is, and why it's known to be an error, eg the guy would be 150 years old if the date is in the 1800s rather than the 1900s."
How about this:
- Colombia's Departamento Administrativo de Seguridad (DAS) was created in 1960 and was disbanded in 2011 (https://en.wikipedia.org/wiki/Administrative_Department_of_Security) . If the paper cards that were microfilmed have "DAS" in the heading, we can be sure that the background check was done between 1960 and 2011.
- For any card, if background_check has a date in the 18xx, it is an error. If it is in the 19xx but before 1960, it is suspect (and potentially very interesting!*) and the raw image should be studied. (* Supposedly, the archives of the office that preceded DAS were burned in the 1960's. )
- If the background_check is wrong, it is very likely that the birth date is alspo wrong and should be fixed by adding 100.
- I'm sure that your database technicians can search for records selecting a range of dates for background_check; the best I can do is to select a date for "any" between 1800 and 1860. https://www.familysearch.org/en/search/record/results?count=100&q.anyDate.from=1800&q.anyDate.to=1860&q.anyPlace=Colombia&q.recordCountry=Colombia&c.collectionId=on&f.collectionId=5000074
- When you ask "Juan needs to supply some URLs of records that have gone wrong so they can be investigated", do you mean something like this?
- https://www.familysearch.org/ark:/61903/1:1:6R73-3Q4S?lang=en
@SerraNola says "โฆusually when I report these I am told the focus is on bulk correctionsโฆ" โ the search copied above ( 1800 โค any โค 1860) produces "Historical Record Filtered Search Results (1,137,556)". Bulk correction seems the way to go.
I understand that there will be a small proportion of wrong records that will not be fixed by this process (i.e. births before 1800).
0 -
@JuanZuluaga3 - thanks for that URL, that is indeed what I believe the technicians need.
Having said that, I am unconvinced that the answer to the definite oddities is a blanket alteration of 18xx to 19xx. We have had way too many mess-ups of automatic corrections that have made things worse.
Certainly, that URL https://www.familysearch.org/ark:/61903/1:1:6R73-3Q4S?lang=en shows something wrong. The birthdate is supposed to be in December 1853 with an Event of a Background Check in 1853, the same year. It seems hugely unlikely that an infant less than a month old will be the subject of a background check. But if both dates should be 1953, the same appliesโฆ And if the birth is in 1853 and the background check in 1953 - well, that's just about possible but unlikelyโฆ (Oh - and the Marital Status appears to be "Married", which suggests that it's not an infant who is being checked.)
So it's not obvious what the values should be - I wouldn't be at all surprised if there is no date on the record card for the background check and so the Event Date has just been generated from the only date on that record, i.e. the birth date. But that's just a wild guess.
@SerraNola - there are serious questions in my mind about this collection. It might be that checking the actual cards would explain what's going on - whether we understand the contents of the cards, whether the issue is date standardisation of 2 digit years, whether the OCR is off. (Or "all of the above"โฆ) Someone, I suggest, needs to audit the collection by cross checking with the actual cards. Which makes me wonder why we can't see the cards that give the indexes? I have a horrible idea that we can't see them because the films would include unindexed images of people who are still alive. At least with "normal" OCR'd stuff, we can cross-check image to index - here we can't and therefore, since we can't validate it, and since there are weird things like the index on that URL, I question the value of this collectionโฆ
2 -
Oh, by the way, I know my Spanish is non-existent, but seeing names like "Juan Lucas C. C. Restrepo Ibaza Nรบmero 79.485" on https://www.familysearch.org/ark:/61903/1:1:6R9Y-GJTV?lang=en or "Juan Carlos Hernandez de la 10 Nom" on https://www.familysearch.org/ark:/61903/1:1:6R73-5GBQ?lang=en makes me suspicious.
And is "Nรบmero" really a Spanish surname? (1,800 indexes in this collection with that as a surname, including "El Nรบmero Nรบmero", who sounds like a Spanish mathematician).
And on https://www.familysearch.org/ark:/61903/1:1:658C-FVHZ?lang=en we have "Salve Cliberto Nรบmero" who must be in possession of a TARDIS as they were born on 16 March 1953 but had a background check in 1920โฆ (Born in 1953? Surely this index should not be released to be visibleโฆ)
1 -
@SerraNola It seems that Juan's tag of you didn't take on this additional thread about this collection.
1 -
@Adrian Bruce1 Images for this collection may have previously been available but now removed for privacy issues. Either way, it is always a concern to consider AI-indexed records being used as sources without verification. The sample Wiki image suggests birth dates could be recorded in many ways, and even correct transcriptions may introduce errors in record details or searches, as can names and places. I urge users to seek additional sources for confirmation when using these records.
รine Nรญ Donnghaile, you are right, I did not see the additional thread.
@JuanZuluaga3 The event type on these records is "Background Check" and "Criminal Background Check" is in the title in record details. Why they chose to label the collection as Civil Registration is beyond me. I will request a correction. As for your desire to have the year of birth bulk corrected, I can only put it in the queue as there are so many other collections with this same error.
Thank you for helping to improve our records.
5








