How can I get the Search Export to include collection and Family Tree attachment information?
Hello
I have been successfully exporting search results for detailed analysis in Excel, but there are two big gaps: firstly I cannot get the Collection to show on the export, and (much more frustratingly) I can't get any existing record Person attachment(s) to appear (which would allow me to filter out the people I have already investigated in depth via Family Tree).
What am I missing please?
I have investigated using the API but it appears not to support simple extraction of Search results unless one is a formal application developer (I could go down that route, but it would be nice if there was an easier way).
Thank you
Mandy Shaw
Best Answer
-
And another update - great news: the Search TSV export now includes Collection information in 4 lovely new columns at the end:
collectionId
collectionName
subcollectionId
subcollectionName
4496118
United Kingdom, Merchant Seamen Records, 1918-1941
4496118
United Kingdom, Merchant Seamen Records, 1918-1941
(I haven't seen one yet where the subcollection is different from the main collection.)
Thank you FS - very helpful.
0
Answers
-
Update:
I thought I had resolved this, by importing my part of the Family Tree into RootsMagic and then using the excellent SQLite RM tools to interrogate the database, but I now discover that the import ignores sources, which is unfortunate (not least because one of the other things I wanted to analyse was whether or not the Person entries had any sources! - this is clearly a key indicator re quality of the information).
So (unless I go through all the imported individuals and manually pull each source across, which I suppose is doable-ish) I remain stuck.
0 -
Further update:
Even without registering in any way as a developer I find (if signed in to FamilySearch) I can use the API:
Next task is to suss out the authentication piece so it works from PHP not just from the browser!
0 -
Cool! I believe if you are doing lots of data manipulation - you'll probably run into the data throttling issue at some point - you'll probably have to handle that somehow.
Question(s): when you say import - do you mean some script you have setup or the FamilySearch built-in spreadsheet results?
I thought the built-in spreadsheet results displayed the collection link (they did at some point unless that has changed)?
Whether that Source is connected in Tree does have the tree icon indicator (when it functions properly... I recall sometimes that gets broken?) - but I wouldn't know how to include that into your spreadsheet. I think I recall the API mentioning something about not liking applications that scrape/mine data too heavily?
0 -
Thanks.
Re import, I'm talking about RootsMagic's automated import from FamilySearch, which would be perfect for my purposes if it included sources!
Re the tree icon, yes that info is great in the browser, but it definitely doesn't reach the export spreadsheet, sadly. And anyway I really want to know which Person(s) is/are attached to this source, not just that there is at least one. If I knew which sources were on which Persons from the Family Tree end, I would have what I wanted, hence my investigations in that direction. I have now submitted an application for API access (being clear that I understand the impact thing), since I cannot (seemingly) get any further while I can only authenticate in the main application ...
Re the Collection, the individual source is linked to in the spreadsheet, but the Collection from which it comes does not appear to be exportable despite being shown on the browser search results.
1 -
You might also take a look at the GEDCOM.io - or whichever link there that leads to the GEDCOM developer community. You might get more/better input on your idea/needs there (hopefully). I can't recall if they have Sources tools library/code available...
0 -
Thanks but I don't think it's possible to export a Gedcom from Family Tree except via RootsMagic etc. - which would prevent the sources from being accessible in the Gedcom either - if you know better, please say!
Anyway I await with interest the result of my application for API access.
0 -
Bit of extra info for anyone wanting the same sort of thing.
I discovered today by pure chance that if you pull the application/json data from a Record ark URL, e.g. https://familysearch.org/ark:/61903/1:1:MQ4X-1GY, it identifies the collection, which is very useful. It does not, however, identify any attached Family Tree Person.
0 -
One example of the benefit of knowing the Collection for a particular Record is that, where the Record has (as frequently) no usable dates on it, your analysis does at least have bounds on the form of the from- and to-dates on the Collection.
0 -
Update, in case useful to anyone, re getting the browser-accessible APIs to work in an automated manner.
I ended up doing this in Python, using the excellent getmyancestors module as a basis.
I have successfully automated everything I needed to access (NB all read-only) via this route:
JSON: Persons, Sources/Source Descriptions, Changes, Collections, *FT Search, Notes, Memories, Record Arkid information (to identify the Collection and last edited timestamp).
TSV: *Record Search results.
*There is a FS limit of 5000 downloadable entries on these searches; also, throttling may well be encountered on Record Search results and needs to be handled properly.
(Incidentally, if anyone needs to pull a gedcom from Family Tree, be aware that getmyancestors a) has this as its main purpose, and b) includes the Sources in the gedcom.)
0 -
In case anyone was tempted to try out the 'getmyancestors' mechanisms, they no longer work since FS changed the authentication setup a few days ago. I can still do what I want, but have had to switch the automation to Selenium, involving a great deal more hassle than the old method (and I haven't yet managed to get it to support arkids).
0 -
Update - now supports arkids, and I have managed to lose most of the complexity in the Selenium approach (+ take advantage of Selenium to collect the numbers of records, both overall and per collection, for a particular Records query).
0 -
Further update: getmyancestors is now working again.
0