General database query
Hello,
I'm trying to get statistical information about the most common names (firsts and surnames) by age and country.
I think Family Search would be an excellent tool for having all those stats, but I think there's no actual way to achieve this, so I'm asking if any of you would know if there's a way to get all these data?
Best Answer
-
I would imagine you would need to submit a formal request for assistance with this, since ordinary users do not have anything resembling this sort of reporting/analytics access. @Ashlee C. can you help here please?
1
Answers
-
I would have thought censuses were a good place to start. I'd say you were probably better off applying to the authorities in the individual countries to see if this summary information is available (and potentially for more recent censuses than are publicly available at detail level). The FS research wiki may well help identify those authorities. I definitely wouldn't use the FS Family Tree in any way, too inaccurate, incomplete, skewed towards those families that are either LDS or happen to have been researched, and anyway we as users have no real analytics access to the FT (or any other FS) database.
3 -
Censuses or birth registration indexes, I'd suggest. (Lots of countries either don't have censuses or have destroyed them after using them for statistical purposes - Australia is an example of the latter).
Purely in the interest of helping you firm up on your ideas - what do you mean by "age"?
If there was a way to access a snapshot of the UK (say) today, age would enable you to analyse when names were used and therefore in what proportion. However, there is no such snapshot (or rather, there is, it's the census which can't be released until 100y have passed).
If you look at birth indexes and still want to analyse when names were used, then you'd need to look at the year births were registered.
Just as an incidental, I'm fairly certain that the General Register Offices of England & Wales, and of Scotland do produce annual analyses of name usage for births - this enables newspapers to track the emerging and declining popularity of names as prompted by fashion. How many administrations do similar, I've no idea…
2 -
For the U.S., the Social Security Administration has (extensive) given name data sorted by birth year, and they make it all available to the public. For surnames, the Census Bureau is the place to look.
As Adrian pointed out, many countries destroy census enumerations after tabulation — but that tabulation often does include name statistics. I don't know whether there's any website that attempts to collect such data in one place, but there are sites that claim to offer worldwide name statistics; perhaps you could start with some of those.
I agree with Mandy that FS's Family Tree would not be a good source of statistics: it would be highly skewed toward "Mormon" and "famous", and the counts would be highly inaccurate due to duplicate profiles and misspellings.
2 -
Thank you all to your replies, but that's not exactly what I'm looking for…
Answering the question of "what do I mean by age?", I apologise, I meant, by year, this is because I'm attempting to generate a database with the most frequent names by in this case, age, periods of 20 years.
Since I'm looking for these frequency, censuses are limited to a few centuries and my goal is to have them since at least the 10th century and by country/area/location.
I know Family Tree may not be the best source, but it's the most complete I know could make me reach my goal, that's why I'm interested in getting the data from this website.0 -
I honestly would not consider the data on FT to be of sufficiently high quality, /especially/ pre census information, to meet what sound to be your needs. I wonder if church or other religious records might be an option for some countries/periods of time?
1 -
Honestly, there are SO many duplicate profiles in the FSFT, with more being created daily, that the data would not be reliable.
1 -
@Paul11102 - re "my goal is to have them since at least the 10th century and by country/area/location."
Never mind the mechanism, you need to seriously reconsider the scope and feasibility of your objectives. Parish Registers didn't begin in England & Wales until 1538 - and I don't think England & Wales was particularly behind the times. ( See https://en.wikipedia.org/wiki/Parish_register )
There is, for the vast majority of English & Welsh folks virtually no evidence of their names before parish registers. Yes, there are name sources before 1538, such as Manorial records, but they are few and far between. Since they aren't online but in various archives across the country, it would be a workload probably equivalent to a college dissertation to do just England & Wales. Plus you need the statistical tools to normalise the English & Welsh data versus the French (say) in order to gauge relative frequencies of Jean and John, say.
As you go further back, so the surviving records with names get statistically skewed to those people who were important enough to be found on charters, etc. Plenty of (upper class) Norman names - very few Anglo Saxon names after the Norman Conquest.
I urge you to study the types of source documents first, identify how they match up to your overall ambition, where they are held, understand their limitations and only then decide how you could extract the data.
3 -
Yes! Indeed I know all these issues involved, but once again, I've found that this website is by far, the most complete database in order to achieve my goal.
This is not a scientific research, this is a hobby for getting stats for names by year and region.
Therefore, the precision and formalism of the database is not relevant, the objective is to have a estimate.
The cognates names are also I'm not making relevant, since I will have the frecuency of Jean in France and John in England, but also the frecuency of Jean in England and John in France.
Then once again, my question is, is it possible to query all of the Family Search database?
0 -
The last I read, I think something less than 15% of the billions of records on FamilySearch are indexed to make them searchable by name.
There is a pilot program for full-text search, but that currently covers only a few sets of records in a couple of locations.
1 -
If I might thow my hat into the ring with minor warning: The list of all names that have been presented in digital records is not ideal. There are some names that have been entered into the digital record that do not actually exist as real names. Example: If you are able to obtain a list of all surnames in the G.R.O. registers for England and Wales you find that some but actually very few are corruptions of real surnames. If you then use this to test surnames that appear in England and Wales censuses or church records, you will find a lot of surnames that do not match. You will be able to select those that are worthy of further scrutiny, and with practice, they might be correctable. Expanding this to other parts of the world will be a huge undertaking.
To summarize: There's a whole load of stuff in there that's not valid. You need to be prepared for that.
5 -
@Paul11102 Further to my previous comment, there is another factor that would skew your results if you were to use digital records 'as found'.
In some cases, especially in records of a family group such as censuses, a person might wrongly inherit the surname of the previous person in the list. There are many possible reasons for how the error might have been introduced, but the effect from your point of view would be to create more instances of the erroneous surname, and fewer of the correct one.
0 -
@Re Searching All your points are clearly good ones. But the fundamental issues with both FSFT and FS Records as sources of this information go much deeper in that they simply don't go anywhere near either the coverage/completeness or the accuracy that would be needed to research the overall stated objective formally (which is in my view impossible). The OP has since said that they are just after a big dataset to analyse for hobby purposes, I guess size is more important to them than completeness or accuracy, but even access to the data is something FS may or may not be able/prepared to provide for them (I honestly suspect the answer will be no, especially given that this is not a formal research project).
2 -
For the US there is all sorts of databases and statistics on names
just google it
https://www.google.com/search?q=most+common+names+by+census+year&rlz=1C1GCEA_enUS813US813&oq=most+common+names+by+census+year&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRigATIHCAQQIRigATIHCAUQIRigAdIBCDY5NjZqMGo0qAIAsAIB&sourceid=chrome&ie=UTF-8
I have to assume for many other countries you can also find similar stats0 -
Dennis, I already pointed to the SSA and Census.gov for U.S. name statistics. The problem is, the original poster is apparently looking for much older data, and that simply doesn't exist.
There are compilations of names from various places in various time periods, such as the Prosopography of the Byzantine Empire and the Dictionary of Medieval Names from European Sources, but they're not statistical sources: since neither the creation nor the survival of records comes anywhere within a planetary orbit of completeness, such compilations do not focus on frequency.
2 -
The Terms of Use page lists several contact options for requesting use of materials outside the norm.
1