Problem with character encoding
I created a gedcom file that uses UTF-8 encoding. This is the heading of the file:
0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 FILE gendata.ged
1 LANG English
When I upload the file to FamilySearch Genealogies, everything seems to work fine. After processing, I am told that the file has 60 people. Then I call compare, resulting in a view. Calling the view shows all 60 people, but all alphanumerical characters in the names and the places have been replaced with other characters. For example, Tönnis now turns up as Tönnis (hopefully this shows as it should). I tried all four settings for CHAR: ANSEL, ASCII, UNICODE, and UTF-8, I tries version 7.0 and I even tried with 1 CHAR left out, but all in vain.
Can someone shed a light on this and inform me how to overcome this problem?
Many thanks in advance,
Fred Simons
Best Answer
-
Thank you very much for your interesting aswer. Inspired by your observations, I did quite a lot of experiments and at the end I found a solution, and maybe an explanation what is/was going on.
I could (almost) reproduce your result. When I copy the gedcom lines from this website and upload it to FamilyTree, everything works fine; my view shows the correct names and has no corruption at all.
So there must be a difference between the original file and the copied file. My gedcom file is generated with Python by writing the lines to the file. In the Python open command, I used the encoding option with setting utf-8. It seems that this option is responsible for putting some invisible information in the file. When you copy and paste the visible contents into another file, this information is missing, and the copied gedcom file behaves as it should. My feeling is that this invisible information (of course I have not seen it) is responsible for the slight misbehaviour of FamilyTree, and does not have any impact when seen by aldfaer.
My solution to the problem is that I do not use the encoding option any more. Then it seems that no invisible information is written to the file, and everything works like a charm.
1
Answers
-
What program are you using to create your GEDCOM file?
Have you tried importing your GEDCOM file into a different family tree program to see whether the issue is there? There are several free ones available.
You could also try opening your GEDCOM file with a text editor to see if the name errors appear in the original file.
0 -
The program I use is written by myself, in Python. A very good suggestion to test my file in another program as well, I did not think of that. So I installed aldfaer (a very popular Dutch genealogy program) and imported the file. To my surprise: it worked perfect. No unwanted character conversion.
I use notepad++ as text editor, and it correctly tells me that my file is UTF-8. I made a minimal working example, so here is a very short UTF-8 gedcom file:
0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 FILE gendata.ged
1 LANG English
0 @397@ INDI
1 NAME Tönnis /Müsing/
1 SEX M
1 FAMS @397R398@
0 @398@ INDI
1 NAME Anna Margaretha /Röseners/
1 SEX F
1 FAMS @397R398@
0 @397R398@ FAM
1 HUSB @397@
1 WIFE @398@
0 TRLR
The view, as constructed by FamilyTree, looks like this:
0 HEAD
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
1 FILE gendata.ged
1 LANG English
1 NOTE Unified System GEDCOM Standardizer 1.0
1 NOTE AFGS
2 CONT 1 GEDC
2 CONT 2 FORM LINEAGE-LINKED
2 CONT 3 VERS 5.5.5
0 @397@ INDI
1 NAME Tönnis /Müsing/
1 SEX M
1 FAMS @397R398@
1 _LHASH 181f553e7764e16009079971510e0972
1 _HASH 29a010de7f7be8f6cfb79bb2583499c4
0 @398@ INDI
1 NAME Anna Margaretha /Röseners/
1 SEX F
1 FAMS @397R398@
1 _LHASH c4ffaadc687df2634f1ef8498b4406a3
1 _HASH 303b4b05c29edf4f07564ef5ae64187a
0 @397R398@ FAM
1 HUSB @397@
1 WIFE @398@
1 _HASH 6b983c92719b49491621cfca04f415fc
0 TRLR
It looks like the problem is with FamilySearch. But since I seem to be the only one with this problem, it is much more likely that I did something wrong. But what? Some or other setting?
Many thanks for your reaction,
Fred Simons
0 -
Sorry for the slow reply. Been a bit distracted in the last few days!
I tried copying your extract above and uploading it as a GEDCOM file to FamilySearch.
When I ran Compare, I did not see the same problem. The special characters display correctly.
However, I then downloaded the file from FamilySearch and the new GEDCOM file did have some corruption in it.
0 HEAD
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
1 FILE gendata.ged
1 LANG English
1 NOTE Unified System GEDCOM Standardizer 1.0
1 NOTE AFGS
2 CONT 1 GEDC
2 CONT 2 FORM LINEAGE-LINKED
2 CONT 3 VERS 5.5.5
0 @397@ INDI
1 NAME Tönnis /Müsing/
1 SEX M
1 FAMS @397R398@
1 NAME Tönnis /Müsing/
2 TYPE aka
1 NOTE AFGS
2 CONT 1 _LHASH 181f553e7764e16009079971510e0972
2 CONT 1 _HASH 29a010de7f7be8f6cfb79bb2583499c4
1 _LHASH 737431597219b51feff5e30627c7b141
1 _HASH 275c637f817469a69ad074b07d85a63f
0 @398@ INDI
1 NAME Anna Margaretha /Röseners/
1 SEX F
1 FAMS @397R398@
1 NAME Anna Margaretha /Röseners/
2 TYPE aka
1 NOTE AFGS
2 CONT 1 _LHASH c4ffaadc687df2634f1ef8498b4406a3
2 CONT 1 _HASH 303b4b05c29edf4f07564ef5ae64187a
1 _LHASH 731e5b024a43e52afe4f990721db8bf2
1 _HASH c4028e4598fe03061d220d0696449d21
0 @397R398@ FAM
1 HUSB @397@
1 WIFE @398@
1 _HASH ed9207d4175e5ba455772425a56cc434
0 TRLR
I then re-uploaded this file and the names appear correctly.
So clearly there is an issue but it is unclear what the full consequences are. I will seek to escalate this.
1