Contents of citations in source lists are being corrupted.
LegacyUser
✭✭✭✭
Jeff Wiseman said: I've come across a couple of these recently and it definitely looks like a broken feature or a failed experiment. In this specific case the end result is an attached source who's URL is not the same as the documented URL in its Citation quote (see lower right hand box in the below image. Furthermore, these types of changes are NOT showing up in any change history entries.
Originally a single source "A" was attached to John Buckley Claypool's 968Q-9TJ source list. When I attached it months ago, it looked like the lower left hand box in the image above. Note that the citation text and the citation URL matched.
Just recently a new hint arrived with a different index URL but the same identical image record URL (top right box in the above image). But even though the ORIGINAL citation (source "A") to an index for that same identical image file was still there, it's citation text had been MODIFIED to match the URL of the newly added hint. Furthermore, the old URL in that same citation had NOT been updated but was still pointing at the old index record!
Also note that there used to be additional useful citation information in source "A" that was also deleted (compare citation text in lower left box with lower right box which theoretically should have been the same)
In any event, something has gotten really scrambled here. This appears to be a TRUE DUPLICATE even though it has different URLs in the different citations. They appear to be using the same index entry
Note that I am starting to see lots of these all of a sudden. New hints coming in are of the type in the upper right hand box when a pre-existing one already existed and has been hacked to have the new URL in its citation text but not changing the URL itlself.
Originally a single source "A" was attached to John Buckley Claypool's 968Q-9TJ source list. When I attached it months ago, it looked like the lower left hand box in the image above. Note that the citation text and the citation URL matched.
Just recently a new hint arrived with a different index URL but the same identical image record URL (top right box in the above image). But even though the ORIGINAL citation (source "A") to an index for that same identical image file was still there, it's citation text had been MODIFIED to match the URL of the newly added hint. Furthermore, the old URL in that same citation had NOT been updated but was still pointing at the old index record!
Also note that there used to be additional useful citation information in source "A" that was also deleted (compare citation text in lower left box with lower right box which theoretically should have been the same)
In any event, something has gotten really scrambled here. This appears to be a TRUE DUPLICATE even though it has different URLs in the different citations. They appear to be using the same index entry
Note that I am starting to see lots of these all of a sudden. New hints coming in are of the type in the upper right hand box when a pre-existing one already existed and has been hacked to have the new URL in its citation text but not changing the URL itlself.
Tagged:
0
Comments
-
Tom Huber said: It was alleged for a long time that the FS source URLs would remain the same. Then it was pointed out that there is a difference between a URL with an "ark" and a URL with a "pal" and I no longer saw anyone allege that the FS source URLs would remain the same.
Now we have this problem and a quick check reveals that a source that has the "pal" in the URL will pull up a source with a different URL (same source, but the URL changes).
It appears that the previously alleged "will remain the same" URL can change. What is going on, FamilySearch?0 -
Paul said: Perhaps a point someone could raise with Ron Tanner in one of his YouTube sessions. I'm sure he was clear in a statement made some time back that the URLs for indexed FS records would never change.0
-
Jeff Wiseman said:
a source that has the "pal" in the URL will pull up a source with a different URL
Tom, Ron Tanner informed us over a year ago that the old PAL styled pointer system was likely to be replaced with the more standard Archival Resource Key (ARK) system. It seems reasonable to think that the ARK format allows expanded capabilities and adherence to their current standard. No problems with that. Systems do change and improve.
https://getsatisfaction.com/familysea...
There is obviously several ways to do this changeover, some more desirable than others, and all with some side effects. Replacing the contents of all citations that are currently attached to source lists throughout the system with the new replacement URLs would be a clean way of doing that, since citations already in place and referenced via tags, etc. would behave the same functionally, and research work that was performed putting those citations in all those source lists would be left untouched.
However, that by itself does not handle the situation where the old PAL URL has also been used OUTSIDE of the FS system. For example, when I sync an old PAL type source from a record in FSFT to my Ancestral Quest database, if FS subsequently changes the contents of that citation in the FT, it does NOT change the citation in my AQ records. So when updating contents of all "in use" citations for a given PAL URL, FS must also leave the definition of the old PAL URL in the FSFT database with a redirection to the new URL.
I believe that what FS has done is along the lines I've described since I can go in to AQ and do a compare of the source lists for a record where I'll see the PAL version of the citation that is still in my AQ database and it's lined up with the new ARK citations in the FamilyTree. Trying to view the source information using the PAL URL from my AQ database still works and is correctly redirected to the new ARK URL in FS.
BUT, what is happening here is different. Firstly, it appears that an ARK type URL is being replaced with yet another ARK type URL to the same exact index data. Instead of updating the "in use" citations, only PART of the citation is updated to the new URL but NOT BOTH references to the new URL. So the citation now is kinda schizophrenic in that it is saying two different things. Furthermore, a new hint is now issued to the record for a citation that has BOTH references to the new URL being in place.
In any event, you now have the same index file data being referenced by two separate citations via the same new URL, but where one of the citations has documented text still in it referencing the OLD URL.
In any event, something is obviously broken and I've started seeing a bunch of these. Furthermore, I've seen where existing Citations have actually disappeared completely but no new hints have been offered to replace them.
And all of this that I have seen has been in the area of marriages in Ohio.0 -
Jeff Wiseman said: Also note that Ron never answered my question about whether or not PAL type URLs stored outside of FS would continue to work or not once they were replaced the FS database with ARK equivalents0
-
Tom Huber said: With the testing I did earlier today, it appears that the PAL URLs convert to the current ARK URL. That was only one instance and happened with a source that I had imported into FS FT from my AQ database.0
-
Jeff Wiseman said: Right. And that is desired as long as you can still use the old PAL URL at any time in the future and it will take you to the same spot. But this business of partially replacing a PAL with an ARK/PAL hybrid citation and then adding an ADDITIONAL citation with the full ARK citation contents is not right.
I can only assume that it is a residual issue from all those related issues with broken URLs a month or so ago. Among others topics, See:
https://getsatisfaction.com/familysea...0 -
joe martel said: If the URL is to a familysearchorg resource (page) then it can be redirected. So older style PAL URLs would get re-directed to the ARK counterpart. The purpose of a FS pal/ark is that it is meant to be permanent or at least resolvable by the fs.org service. Now the content there on that page may be there or not, say depending on contract or access changing. A pal/ark/url to a third party site (not FS) is outside the control of FS so it is dependent on that party to maintain/redirect or whatever.
As for the citation not showing the same as the URL in the Source I'd have to dive in. I believe that when the Source is created (say through SourceLinker) the URL and Citation are not editable but the URL is stored in the Source. That URL follows the pal/ark model and may redirect to the newest version of that resource. So what you see in that citation is dynamically created at the time the source view is rendered, via the URL to get the service there.0 -
Jeff Wiseman said: Thanks for the clarifications Joe,
Yes, that is exactly how I understood things. And over the past several months, because I backup (sync) source lists for person records to my Ancestral Quest database, I have been able to observe this happening as old PAL type URLs were retired. But for existing citations in source lists, the updating of the URLs were always transparent. They were changes in place. Only if you had a copy of the earlier citation (like in my AQ source list backups) could you compare and see that both reference URLs in the citations had changed.
But a couple of things obviously have not always gone as planned. For example, In several cases when the contents of existing citations attached to source lists were “updated” to reflect a new URL, the citation text disappeared and the URL was broken:
https://getsatisfaction.com/familysea...
Then there are the cases where several citations seemed have just disappeared from the source lists where they had been attached. Furthermore, they could not be found again through search (my own experience). I can’t find any examples at the moment. They may have been just citations that lost their URL and so people detached them. Afterwards, they couldn’t be found and reattached.
So now you have at least 2 issues remaining. You have the old problem for the past couple of weeks where existing citations have only been PARTLY updated to new URLs. This source was attached in July 2018 and then modified by FS at some more recent time:
And you have the new issue where additional hints pointing to the exact same index data items are being issued resulting in TRUE duplicates in source lists. The following source was hinted and then attached in the last month or two:
Here are the two separate citations in the source list both referencing the same identical index data item at Z8SM-M56Z:
So as a result I am seeing multiple citations and hints showing up for exactly the same index data. These are TRUE DUPLICATES.I believe that when the Source is created (say through SourceLinker)...
I believe that the source already exists, you are creating a citation to it through the SourceLinker. That citation is what gets attached to the source list as shown in the first two images above....the URL and Citation are not editable...
Exactly. These are the values that are not modifiable by anyone and yet THEY are the ones that sometime in the recent past has been modified, apparently by some FS batch tool.So what you see in that citation is dynamically created at the time the source view (i.e., the citation attached to the source list) is rendered
Again, exactly--EXCEPT, in this case some tool of FS has gone in and altered them at a later date (in this case almost 2 years later).A pal/ark/url to a third party site (not FS) is outside the control of FS
I was referring to a third party site that has a copy of an earlier pal/ark/url *FROM* FS. That FamilySearch persistent URL should continue t… [truncated]0 -
joe martel said: I think I follow and agree with most of Jeff's comment.
When a FS Source is push/pulled to any third party app then all bets are off for the resulting Source that comes from the third party. Round trip of all the Source data can not be guaranteed and will probably be lossy because the sync (push/pull) of data is rarely an exact match of the data schema.
If a third party has a FS ark/pal then it is treated just like of that URL was called from FS. It follows the same conventions of long-lived. Now you also have to consider that when that url is called being logged in and user rights come into play as well. But that is true if the caller is FS or a third party.
Yes the duplicate sources are a possible result of arks/pals and republishing of collections. Here my approach. I look at all the fields of the two Sources. If they are all the same then you can detach one of them. (I might look at the attach reason to see if one has better info than the other). If they are both FS Sources (tree icon) and the URLs are identical then I pick the one that has the best Notes to keep. Citation doesn't matter because that is autogenerated. If the URL's are different then I might keep them both, because they are going to different resources, even though they may look the same in the views. You never know when that collection is made better (fuller index).
The issue with the citation not matching the url is weird but I'm not sure if it's a big issue, as the url in the citation still is long-lived and will redirect. In most cases internet citations (i.e. Elizabeth Shown Mills) are redundant to the URL field.0 -
Jeff Wiseman said: Joe, thanks for your thoughts on this.
If they are all the same then you can detach one of them
That usually does not work because the hints engine sends it right back to the record and because it really DOES belong to that record, inevitably someone else will come along and reattach it (and rightfully so). The problem is that these TRUE DUPLICATES that FS is creating winds up like sediment collecting in the source lists of everything that is referenced in them.
This is indicative of URLs being retired but not being completely replaced with the new URLs and so you get these duplicate artifacts scattered around cluttering up source lists.If the URL's are different then I might keep them both, because they are going to different resources, even though they may look the same in the views
but in this case, you have different URLS, but when you follow one, you can plainly see in the URL line of your browser where it redirects to. When THAT URL is identical to the URL of the other citation, you KNOW that they are bringing up the same exact index file (source). That is the case here.The issue with the citation not matching the url is weird...
I would call it just plain WRONG. The documented text citation of where the source is does not match the URL of the citation. Yes, the automatic rerouting functions on the URL will ensure that you get to the correct place, but a paper copy printout of that citation is wrong because it says two different things which on paper are in conflict with each other.
Also, I'd gamble that it you have a subsequent update for the URL on one of these previously updated and inconsistent citations, the tool based update would likely fail because it is searching for the URL that it is supposed to replace. That URL is no longer their. We just haven't gotten far enough along to see such failures yet.
So is it a "big issue"? From a documentation consistency standpoint, I'd say "yes". If you view this like trash scattered through the streets of your favorite city, is it a big issue? Some people might think so. But in that analogy, any citizen can go around and clean up an area thus reducing the problem more. But in the case of these duplicate type citations, none of "average Joes" out there (pardon the pun :-) can do anything about it. We cannot delete these URLS from the system. And since they were created using new ARK type URLS, everyone is now stuck with them cluttering up their source lists.
The system did not behave like this before. Retiring PAL type URLS was working well and clean for quite a while.0 -
joe martel said: Good stuff Jeff. The teams are aware of this thread. Thanks for being quite specific and detailed regarding the issues here.0
-
Jeff Wiseman said:
Now we have this problem and a quick check reveals that a source that has the "pal" in the URL will pull up a source with a different URL (same source, but the URL changes)
Tom, this is actually the way it is supposed to work. This is how when a PAL URL is "retired" and is replaced on the source with a new ARK URL, If someone attempts to access that source via the old PAL URL, the FS site automatically redirects it to the new ARK URL. I.e., the old PAL URL is persistent and still works for accessing the source that it used to work for.
The problem is that something changed (at least for a short while) about the mechanism they were retiring those PAL numbers with, and as a result, your originally attached citation will still "get you there," but is inconsistent from a documentation standpoint. You also now get a new hint that is a true duplicate of the original (i.e., it takes you to the same exact index data item)0 -
Justin Masters said: I'm not sure if this is part of the same problem, but for Harry Benjamin Hummel, (KP7V-1VY), the source record listed for a marriage does NOT point to his marriage record, and then in the source-linker, it's showing up without a graphic link, whereas in the preview mode, one could see the document (incorrect) allegedly pointing to his marriage record.0
-
Justin Masters said: Jeff, I got confused at some of what you said, but I'm also trying to multi-task. Thank you for adding the note. I missed it by ][ that much. I looked 1 page before and after.
Part of what I also don't understand when reviewing who this record is attached to, is that it ONLY shows Harry Benjamin Hummel being attached to it, despite all the other folks in a typical marriage record being attached to the source. I wonder why it's not showing everyone.
I'll look again later, but I have some important paperwork due uh... a day or two ago I just remembered I needed to fill out.
Thank you again for looking at this! I'll look again later, I promise!0 -
Justin Masters said: Jeff, I tried following your instructions again, and here's where I'm having problems. You wrote:
The digital index data for Harry will be at the URL indicated: https://familysearch.org/ark:/61903/1...
By clicking on that URL, you can see the contents of that specific index file data entry for Harry. You can see Harry's indexed data in the file and the citation to the source image where it was derived from.
I understand that part above. It's pointing to Harry's specific part of that record.
Note here again, that the image shown is NOT from the index data source. It has only been included here for convenience (and sometimes confusion :-) The citation information that points to the image source where this index data source was derived from looks like this:
This part doesn't make sense. The "View the original document" link mentioned under the picture shows the EXACT same URL as actually hovering over the picture, and I don't see ANY reference to a particular image number, etc, as shown in the graphic after those instructions. (ie, I don't see the info relating to image 712 or 710, etc except when I'm looking at the image itself and seeing the image number in the upper left corner.
When I click on the picture or the "View the original document" link, it takes me to the picture, not the table looking thing you show. And I'm sufficiently lost that I can't figure out how you got to that info.0 -
Jeff Wiseman said:
Part of what I also don't understand when reviewing who this record is attached to, is that it ONLY shows Harry Benjamin Hummel being attached to it, despite all the other folks in a typical marriage record being attached to the source. I wonder why it's not showing everyone
Exactly. That is because the index file source that is being cited (i.e, at the H5JG-R33Z ark type URL) is for only a single person. All items in a source list that are for indexed records each points to a single person data item (i.e., just the data that was indexed from the image source for that specific person). This is the normal nature of indexed data sources and contributes to what Tom Huber sometimes calls the "person-centric" handling of sources.
(This is also why in an indexed marriage record, you need at least 2 citations to the index to show evidence for a couple relationship (marriage). You need the one for the Bride and the one for the Groom. FS tried to hide this fact when setting up the discreet Couple Relationship source lists by using a single entry and kludging the title but obviously that didn't work)
The index data source for that person has an embedded citation to a specific place/image on a film where that person's data was indexed from. But all other index data sources for individuals that were indexed from that same image source film/image number, will all have the same embedded citation pointing to it.
For your second question, the information you need is shown below:
Note, just like all of the indexed data source items, each image of the original document has its own unique persistent ARK type URL. It's just that the wrong one was attached to Harry's indexed data source entry due to the erroneous image numbering that was used in the creation of the index file.0 -
Justin Masters said: So... if I understand you correctly, with the individual-based source references, doesn't that undermine the effort to show unattached people on a source record?0
-
Jeff Wiseman said: I don't think so. Many index files seem to support some kinds of simple "groupings". In censuses, they are "households". In marriage records, they are typically the record or record set (all the data coming from that license, registration, and certificate collection as they show up in the image source. The source linker loads all the individually indexed names in one of these "groupings" in the index file. The "unattached persons" are just all of the index data items for any of the names in that group which have not been cited from someone's source list yet.
It's just a side benefit of the grouping concept used in a lot of index data files.0 -
Justin Masters said: I guess I'm lost... (but I suspected that awhile back) When I look at the source, and then I compare who's connected there vs. viewing the people attached to that source (and only see one)... then I don't get the disconnect. but... there's a part of me that says, "Let it goooo, Justin. Find a kid who can operate the remote." LOL0
-
Jeff Wiseman said: :-)0
-
-
Jeff Wiseman said: I've also noticed that the sync'ing mechanism for sources in Ancestral Quest breaks when dealing with these corrupted sources. It will see them on the FS side, and you can sync them down to the AQ database, but then AQ cannot recognize them. Even though the corrupted source gets added to the database in AQ, in the sync window nothing has changed. It looks as though you need to sync it again. But if you do, it just keeps piling up duplcates of the corrupted source in the AQ database.
I have reported this to Incline Software, but having software all of a sudden break because it is handed an ill-formed data structure is very, very common. That's why consistency in data structures is so important. Anyway, for this reason it is very possible (and even likely) that other tools out there that support syncing sources with FS have now broken as well.
So if you are using such a tool, until this gets fixed in FS (or the API is redefined to allow for these anomalies) just be aware that your sync'ing of sources may currently have undesired side effects.0 -
Jeff Wiseman said: Here is a copy of a list I posted over in
https://getsatisfaction.com/familysea...
The problem with these source corruptions appears to be related to the fouled up data contents in the sources as well, so the same names are usually showing up in both of these threads.
Bertha Marie Hull LB67-LZW
James Byron Hull LBSZ-2K9
Daisy Belle Santee L4WY-5QW
William H Smith GSG5-YXX
Bessie Mae Gray LRDJ-CXN
Harry Lathan Waugh 9H8F-DGX
Delbert Mershon KGC9-624
Goldie Frost LJY8-2WZ
Wayne Douglas O'Bryant KVGH-CGR
Cecil Fern Santee KVGH-DHV0 -
Jeff Wiseman said: Here's another pair I just found (again, Ohio County Marriages 1789-2013)
Gilbert Johnson GM27-K9V
Frances C. Anderson 27BZ-WHM0
This discussion has been closed.