How big is 'my' tree?
LegacyUser
✭✭✭✭
Lyle Clugg said: Most of us picture our family tree as a majestic oak, starting with us at the bottom and spreading wider with each new generation we add. Let’s face it. My family tree more realistically resembles a banyan tree. Most generally, it tops out at four to eight generations unless there is a well proven line of royalty that may go back thirty or more generations. Each of our great-great-great grandparent’s descendent lines spreads the base of the tree, and when we add in the spouses and their ancestors and descendants, we have a thicket of interconnected humanity.
With my PC and Legacy Family Tree, I can determine with a single click how many people I’m related to, or at least those I’ve identified. In FS Family Tree, when I attempt to add a relative, I usually discover that they are already in the tree. With a single click, I may add thousands of individuals to ‘my’ family tree. I no longer know how big my tree is.
My interconnections on the tree eventually end. I’m not related to everyone on the tree, at least not yet. There are many banyan trees represented on FS-FT. I would love to see each of these trees interconnected with each other. The eventual goal would be that there are no independent trees. Everyone on the tree would be connected to everyone else on the tree.
Another problem becomes quickly evident. When I search for a person to determine if they should be added as a new person or simply connected to one of my existing relatives, I may find a couple or dozens of records which later prove to be the same person. These records need to be merged to preserve the integrity of the tree. Most of these personal records appear to have been automatically created by the recording of a single event – birth, christening, marriage or death. So now the goal has to be – connect everyone and do it correctly, which includes removing all duplicates. It is a lofty goal, but it can’t be accomplished without knowing how we are doing.
I know my request won’t be the highest priority item on Ron Tanner’s list, but I at least hope it will be added to the list. For those of us who live by numbers and statistics (I was a math major in college), please write a program that administration would run periodically to determine:
- What is the number of ‘independent’ trees in the database?
- What is the largest independent tree – how many individuals does it contain?
- What is the average size tree?
- What is the median size tree?
- How many trees include no more than three people? (This ‘tree’ would have been created by a birth record – mother, father and child)
These statistics, and possibly many others, would help to measure the ‘health’ of the tree. They could be published on the blog or in existing newsletters. The trend month to month would help determine if we are truly on the road to achieving the goal of creating a unified record of humankind or if we are just adding records that have no connection.
Another statistic that would be of interest to individuals would be, “How big is my tree?” How many individuals am I connected to? If I were a member of the ‘biggest’ tree, you wouldn’t have to calculate it, just report that as of XXXX date, the number was X. Otherwise, the routine would have to look through the database to see where my connections end.
Family Tree is great, and getting better all the time. I'm hoping more information about the tree will make it even better.
With my PC and Legacy Family Tree, I can determine with a single click how many people I’m related to, or at least those I’ve identified. In FS Family Tree, when I attempt to add a relative, I usually discover that they are already in the tree. With a single click, I may add thousands of individuals to ‘my’ family tree. I no longer know how big my tree is.
My interconnections on the tree eventually end. I’m not related to everyone on the tree, at least not yet. There are many banyan trees represented on FS-FT. I would love to see each of these trees interconnected with each other. The eventual goal would be that there are no independent trees. Everyone on the tree would be connected to everyone else on the tree.
Another problem becomes quickly evident. When I search for a person to determine if they should be added as a new person or simply connected to one of my existing relatives, I may find a couple or dozens of records which later prove to be the same person. These records need to be merged to preserve the integrity of the tree. Most of these personal records appear to have been automatically created by the recording of a single event – birth, christening, marriage or death. So now the goal has to be – connect everyone and do it correctly, which includes removing all duplicates. It is a lofty goal, but it can’t be accomplished without knowing how we are doing.
I know my request won’t be the highest priority item on Ron Tanner’s list, but I at least hope it will be added to the list. For those of us who live by numbers and statistics (I was a math major in college), please write a program that administration would run periodically to determine:
- What is the number of ‘independent’ trees in the database?
- What is the largest independent tree – how many individuals does it contain?
- What is the average size tree?
- What is the median size tree?
- How many trees include no more than three people? (This ‘tree’ would have been created by a birth record – mother, father and child)
These statistics, and possibly many others, would help to measure the ‘health’ of the tree. They could be published on the blog or in existing newsletters. The trend month to month would help determine if we are truly on the road to achieving the goal of creating a unified record of humankind or if we are just adding records that have no connection.
Another statistic that would be of interest to individuals would be, “How big is my tree?” How many individuals am I connected to? If I were a member of the ‘biggest’ tree, you wouldn’t have to calculate it, just report that as of XXXX date, the number was X. Otherwise, the routine would have to look through the database to see where my connections end.
Family Tree is great, and getting better all the time. I'm hoping more information about the tree will make it even better.
1
Comments
-
Jade said: Some considerations:
-- The longest pedigrees in the tree database are most likely to be mostly wrong, going through undocumentable/imaginary central Asian names, Roman Empire, "House of Troy," and much earlier times.
-- Just because you can connect a documented ancestor of yours to someone in the tree database who has many other connections does not mean that the existing pedigree or the supposed descendants are documented in any way.
-- There are many ways the existing tree database is composed of largely fiction. User-submitted trees and IGI entries and a computer-algorithm's combining/linking some of these in the 1990s are among them.
You can find some of the statistics you are looking for by searching this message board for a recent message using the term "primary tree". But the statistics cannot tell you anything at all about genealogical accuracy -- which is achieved by patient research, mostly in brick-and-mortar repositories, finding evidence of person-by-person relationships. Remember that "sources" which may be fictional genealogies (books, newsletters, websites, trees) are not enough without your human brain's evaluation of the actual material in the source -- is it applicable as to dates, places, relatives, life-path; is there actual documentation or just assertions, etc.0 -
Ben Baker said: I believe the response Jade is referring to was Randall Johnson's on 2/26/13 on this thread https://getsatisfaction.com/familysea... I've copied it below for convenience since apparently GetSatisfaction won't open the response since there were several after it.
"There are connected components of various sizes in 'the tree'. A connected component is a set of nodes (or vertexes) that have relationships with each other but are isolated from other connected components. By far the largest number of connected components are of size 2 and 3. Presumably, these were created from name extraction of marriage (size 2) or birth (size 3) records and from church membership records. As of the 1st of February there were 47,611,113 connected components of size 2 and 132,469,998 connected components of size 3. (There are also about 40M isolated nodes with no relationships). Together these small sized components make up significantly more than half (~530M) of the approx. 950 million nodes in the tree. The largest connected component has 275,520,574 nodes in it, roughly 1/3 of the whole tree. The next largest connected component only has 6,527 nodes."1 -
Lyle Clugg said: Thanks Ben, for the link to the other discussion. And your statics are exactly what I'm looking for. I would add a few more counts to the list - averages, medians, etc., but the ones you have are great. From your statistics, it looks like there are over 200 million 'twigs' on the tree. By actively pruning the tree of these twigs, the quality can be greatly improved. I'm guessing that a large number of them could be connected to the tree simply by doing a search for possible duplicates. I recently worked to clean up one couple on my tree. I found a total of two dozen duplicates between the two of them. Most of the time I could easily identify the extraction record that created them. The ultimate goal would be to have zero twigs on the database, and the only way to know if the goal has been accomplished is to have a set of regularly updated statistics.
I agree wholeheartedly with Jade that the ultimate goal is to have a perfect tree, but you have to start somewhere. I obtained a GEDCOM file from a cousin four years ago, and I've been working to clean it up ever since. The thing is, for every one he got wrong, he may have done fifty correctly. Most of his errors are from the 16-1700's where he obviously copied trees from Ancestry.com. It would be senseless to throw away years of research, just because he got a few wrong.
The crowd sourcing model of FS-FT is designed to eventually get the best tree possible. Other researchers have suggested that sections of the tree that have been well researched should be 'frozen' to prevent changes, except those changes that are well sourced and have administrative approval. Anyone, at any time, should still be able to add to discussions and comments.
Although I'm an amateur genealogist, I still have a passion for getting things right. If I can't prove a relationship, I don't add it to the tree, or if I do add a person, I record my concerns in a discussion. Most of us don't have the time or resources to do the brick and mortar repository search. The internet has been a boon to obtaining original documents. Exercising care is required to produce good results, but it doesn't mean we can't use the tools we have at our disposal.0 -
This!
I would like to see a frequency graph: how many million treelets of size 1, 2, 3 ... X million are there now? How has Family Tree grown since 2013?
0
This discussion has been closed.