GEDCOM import bug regarding surname processing when ADD:ing data
When importing GEDCOM files, hyphenated surnames are wrongly imported. An example would be a line like the following one which is fully compliant with the GEDCOM standards.
1 NAME Edouard Albert/Mermet-Grandfille
After importing the file, compared it, and visualizing it, on the left-hand side, the surname is displayed properly, but once you hit the ADD button, the surname becomes Grandfille only. Actually, assuming you have surname like X-Y-Z then only Z remains after ADD:ing the person. If ones replaces hyphens by spaces, the exact same problem occurs. I have tried different formats, even if not aligned with GEDCOM standard, just in case, like
1 NAME Edouard Albert/Mermet-Grandfille/
1 NAME Edouard Albert/Mermet Grandfille/
1 NAME Edouard Albert/Mermet\-Grandfille
1 NAME Edouard Albert/"Mermet-Grandfille"
but it looks like the ADD processing disregards from any character that is not a letter and only takes the last name should there be several. This is serious issue as hyphens or spaces are part of many surname. I have a huge number of persons in my GEDCOM file and, first, ADD:ing them one at a time is going to take a very long time, but having to manually correct many of them is going to be a nightmare.
Hope engineers can have a look at it very soon.
Comments
-
This definitely does look to be an improper handling of surnames when creating a new person record from your GEDCOM file using the GEDCOM compare tool. FS Engineers should have a look at it.
Please be aware that the GEDCOM compare tool has some pretty serious limitations. It can miss many pre-existing records that are already in the FSFT resulting in the creation of duplicates when you use the ADD function. Another concern I have is that the misinterpreting of surnames may also be affecting the Compare tool's ability to find that surname in the existing FSFT database.
So here's hoping that a FS Engineer spots this and looks into it.
1 -
Pascal
.
I 'see' what you have tried ...
.
1 NAME Edouard Albert/Mermet-Grandfille/
or
1 NAME Edouard Albert/Mermet Grandfille/
or
1 NAME Edouard Albert/Mermet\-Grandfille
or
1 NAME Edouard Albert/"Mermet-Grandfille"
.
But ...
That said ...
.
What about the actual "Surname" field in the GEDOM File?
.
eg.
1 NAME Given Names /Surname/
2 NPFX Prefix
2 GIVN Given Names
2 SURN Surname
2 NSFX Suffix
.
I do not know; but, I was just wondering with your tries (above) ...
Did you ALSO "Edit" that ... 2 SURN field, with those different attempts ...
Or, only the 1 NAME field?
.
You probably did ...
.
But ...
Just thinking out aloud ...
.
Brett
..
0 -
Pascal
.
FYI
.
I did find this ... within an article ...
.
------------------
<<There are few people with double surnames, and they transmit them fully, normally hyphenated, but this is a different thing (example: given name(double)="Josep Maria" + 1st surname(double)="Badia-Rovira" + 2n surname(single)="Torres" == FULL NAME="Josep Maria Badia-Rovira Torres", listed "BADIA-ROVIRA TORRES, Josep Maria">>
.
This person would need to have a gedcom record like this:
.
1 NAME Josep Maria /Badia-Rovira Torres/
2 GIVN Josep Maria
2 SURN Badia-Rovia,Torres
.
Notice the "," - this shows that the person has two surnames, not one. The first surname is "double-barrelled" (english term for two surnames with a hyphen), the second surname is normal.
.
------------------
.
I do not know if this might help.
.
Brett
.
0 -
Will play a bit with additional tag like SURN and GIVN and get back with my comments.
0 -
Thanks for the suggestion! I played a bit with the suggested additional tags (GIVN and SURN) and this seems to clarify things for the ADD function and provide the expected results. I for example replaced a test line:
1 NAME Lucie Marie Louise/Webera-Weberb Weberc
by
1 NAME Lucie Marie Louise/Webera-Weberb Weberc
2 GIVN Lucie Marie Louise
2 SURN Webera-Weberb Weberc
and after ADD:ing the person, firstname was "Lucie Marie Louise" and surname "Webera-Weberb Weber". Many thanks for your feedback!! I still find it surprising that without GIVN and SURN, the person details on the left-hand side are anyway displayed properly, yet not after ADD:ing the person. The NAME "/"-separated format make the key:value pairs very clear. As such, it still feels there might be room for improvement. Having data with about ~80K persons to import, keeping the import as straightforward as possible is of high importance. Still a bit frustrated that fields like NOTE, OCCU and SOUR are disregarded by FamilySearch import as those are of great importance, but that's a different topic.
0 -
Slight typo in my previous post, surname was ""Webera-Weberb Weberc" (with the "c" at the end, of course).
0 -
Comment 1:
This is a bit tangential to the original issue but needs to be commented on here.
As I hinted at before, if you have 80,000 names you are trying to import, chances are very high that 50,000 to 70,000 of those names are ALREADY IN THE TREE. If you use the ADD function in the GEDCOM compare tool to dump all of those person records in the tree, you will have created a massive MESS of duplicates that will have to be cleaned up and merged away, or all kinds of undesired side effects will start occurring that will seriously impact the work of numerous other patrons with FS accounts.
Each individual in the GEDCOM file must be evaluated one at a time for duplicates already in the tree. The Compare tool does a really LOUSEY job of helping with this. Furthermore, whenever you add a new record to the FSFT or modify any existing records already in the FSFT, you must provide sources and justification along with your "Reason this information is correct" statement for each change or addition that you make. If you do not do this, others will come along and delete or modify all the work you did.
So if you are planning on adding 80,000 names to the FSFT database, you are likely going to have to do this well over 80,000 times (e.g., changes to different attributes may require different sources and reasons). Since the GEDCOM files do not handle sources very well at all, that means you will have to do a lot of manual source documentation for each person record your GEDCOM file touches or duplicates in the FSFT. If you do not do this you are imposing all of that work on many other FS patrons that will be impacted by all those records you are dumping into the FSFT.
This has been a very significant problem here causing a lot of people much grief due to massive dumping of names from GEDCOM files. For the past several years there has been many many people who have PLED with FamilySearch to REMOVE this capability of quickly dumping large numbers of records from a GEDCOM file into the FSFT. The longest topic discussion to ever exist in the old GetSatisfaction.com forum (over 1600 comments) was on this very subject. I have no idea why FS has ignored all requests to prevent this kind of abuse in the system
(to be Continued…)
0 -
Comment 1 Continued:
FamilySearch FamilyTree is TOTALLY DIFFERENT from sites like Ancestry.com where everyone has their own personal trees that no-one else can modify. the FamilySearch FamilyTree is a single SHARED tree with tools specifically designed to support a single record for each person who has existed in the world (at present there are 1.3 Billion of them). If you create a separate detached tree in the FSFT database using the ADD function of the GEDCOM compare tool, you will see it very quickly start to disintegrate as others start merging those duplicates into person records that were already in the tree (and possibly far better researched and documented than those in your GEDCOM file). All the hint engines and duplicate identification tools in the FSFT are designed to specifically remove all of those duplicates.
And making any changes to the FSFT database without giving any sources, notes, discussions, Reasons text, or other justifications for each of those changes will cause many others (as well as yourself) a lot of grief.
So please keep these things in mind as you make modifications to the FSFT based on your GEDCOM file. That compare tool is not intended for the purpose of a person dumping "their tree" in their GEDCOM file directly into the FSFT. It is for specific improvements to be made to the FSFT based on data in your GEDCOM file. Also, the mere fact that the data came from your GEDCOM file is NOT a reason or justification for making changes. You need sources and text explaining why the information that you are changing in the FSFT is more accurate than the data that is already there.
I really wish that FS would shut down the capability of dumping thousands of names into the FSFT from a single GEDCOM file until they can refine the tool better. And most of us on the forum who have been burned significantly by people dumping GEDCOM files would be just as happy to have the GEDCOM mechanism shut down permanently.
0 -
Comment 2:
You stated that "Still a bit frustrated that fields like NOTE, OCCU and SOUR are disregarded by FamilySearch import as those are of great importance"
Note that that information *IS* there--sort of.
When you import your GEDCOM file, it is imported and placed into the Genealogies database and configured so that it is searchable in FS. This is NOT the FamilySearch FamilyTree database. It is a different one. I know that notes and some other GEDCOM field information is stored with that original GEDCOM data.
However, when you use the GEDCOM compare tool to compare and add your GEDCOM records (now in the Genealogies database) to the FamilyTree database, that information is not made available to you. It is just one of the many ways that the Compare tool BLINDS YOU to important information in the system. To see that information you'll need to open another window and do a search in the genealogies database to find the records that you imported. From there you will be able to see the NOTES and some other data from the original GEDCOM file.
Unfortunately, there are several things broken with the viewer for these records. For example, if there are too many sources or notes, the viewer can fail and will not expand the list. In the above example there are 20 sources but it won't let you see them. Also there is data displayed that is not labeled correctly (the dates just above the Notes section in the above example are for a divorce and for the temple sealing).
Also unfortunately, since the records that you imported have likely been previously imported into that same Genealogies database by other people on many different occasions over the last 20-30 years or so, it can be very difficult to discern which one is yours from the huge list of similar person records that result from the search (yet another failing of the GEDCOM import support at FS--you can't search for Genealogical database records that you specifically contributed).
0 -
Hi Jeff, I fully agree with what you stated. Being a scientist I am usually careful with quoting sources. In the present case, the huge number of records cover a fairly specific region (local mountainous area in Jura, France) that by experience is very pourly covered on FS (nothing bad with that, it was rather expected). About 60K people are unknown to FS, and those past years I have already added quite a few working with FS Web interface (providing sources for each of them). Most of the records have detailed information with birth, christening, death, and burial dates and places (standardized), as well as wedding dates and places, spouses and children. It goes back to about 1600. I would therefore have loved to import SOUR and OCCU as well for those individuals, but I fully understand that information quality is critical to all, and as we all know, quality takes time, so time it will take.
1 -
Ok, so you already have a good understanding on how all of that works. That's great. A LOT of people using those tools do not and don't realize the problems they are creating.
Again though, I really wonder if those fields (i.e., the SOUR and OCCU) are not recorded in the Genealogies database when you imported your GEDCOM. I suspect that they may be. Like many things with the Compare tool, they just aren't visible in the compares but may be visible when directly searching for and viewing them in the Genealogies database.
I am interested in this as well and would like to investigate it a bit. Could you provide an example of a person record that was imported from your GEDCOM file which had those fields populated in the GEDCOM file? If you have already transferred that example from the Genealogies DB into the FamlySearch FamilyTree, the PID for that person record in the FSFT would be useful.
0 -
Hi Jeff, I have played a bit and thanks to your feedback pointing at the Genealogies DB, I checked there, and I can see the OCCU field being displayed on the right-hand side with corresponding value. It however does not look like it is being ADD:ed into the FS DB.
1 NAME ...
2 GIVN ...
2 SURN ...
1 OCCU ...
1 SOUR @S1@
2 PAGE ...
2 QUAY ...
The SOUR is also added but only the SOUR record (below), no further information regarding the SOUR citation record (above)
0 @S1@ SOUR
1 AUTH ...
1 TITL ...
0 -
I suspected as much. The entire area surrounding GEDCOM files is really wanting. The main thing is if you need info like that for determining if a record is better as an ADD or an ALREADY IN FAMILY TREE condition, you can at least bring up another window with the Genealogy DB records in it for reference. But it IS clumsy!
BTW, if I remember correctly, the NOTE fields are also copied into the Genealogies DB but can not be transferred across to the FSFT. Because of its high value, that field also should always be visible during a Compare and Merge, but like so many other valuable pieces of information, it is not. Par for the GEDCOM course.
By the way, did you notice in the citations area of my example right under my name they have a statement to the effect of "name withheld for privacy purposes".
Say WHAT?
0