Any Explanation of Search Result Order Available? Or are there flaws in the routine?
Are there any posted explanations of how the order of search results are determined? Or can anyone from that team explain how this works? Sometimes the order doesn't seem to make sense, but that is only from not knowing how the ranking is determined in the SEARCH routine in the historical records and the FIND and FIND SIMILAR PEOPLE routines in Family Tree.
Here is an example using the FIND routine in Family Tree I ran across today that I would be interested in understanding so I can make better use of the routine: https://www.familysearch.org/tree/find/name?birth=Stord%2C%20Hordaland%2C%20Norway%7C1890-1890%7C0%7C1
There were 62 people born at Stord in 1890. I know because I have entered all of them and tried to find every single duplicate I could to merge them in an effort to find every single cousin of my wife and not miss any of them. These are entered with full birth and christening dates with birth places and christening places properly standardized. So I would expect all of these to come first in the results, but they don't. I have some theories as to what is going on, but don't know how correct these are.
Going through the first twenty results:
1-3: Have "about 1895" and birthplace is standardized to "Stord, Hordaland, Norway." Does this mean that approximate birth dates take priority over exact birth dates?
4-6: Have exact birth date in the range requested and birthplace is standardized to "Stord, Hordaland, Norway." I would have expected these to come first. Again does something about the "about" date take precedence?
7: Has birth date in the range and birth place is standardized to a place one geographical level smaller than Stord. This seems to fall correctly after 4-6.
8: Again has an "about" date, but this is actually closer to the requested range than the first three results. So why didn't it come first? The place is standardized to Stord.
9: Looks to be in a reasonable place.
10: The birth date is out of range, but the christening date is within the requested range. Does this mean that a christening date takes priority over a birth date?
11: Appears as expected
12: Here is another "about" date which is even closer to the requested range than the first two. Does this mean that "about 1890" would be even farther down the list? How does the search routine actually see approximate dates either standardized with an "about" or a "to... from" range? The place here of Hoyland, Reppen, Voss, Hordaland, Norway at first glance looks totally off, but checking the record, it is actually incorrectly standardized as "Høyland, Stord, Hordaland, Norway" so does fit in this list.
13-14: Appears as expected.
15: The birth and christening dates fall in the range expected, but the birth place and christening place are way off. They are standardized correctly. Many more results should come before this one. Why does this one appear here?
16: Appears as expected.
17-18: Two more with about dates, both standardized to "Stord, Hordaland, Norway. Why don't these appear with the other approximate dates with that standard place?
19: Appears as expected.
20: Has an approximate date. The place is not standardized at all. Why would it get dropped in here when there are so many more places that are standardized with "'farm,' Stord, Hordaland, Norway" that I would expect to come before it?
Answers
-
I can give you a partial answer when it comes to the Historical Record search. I did similar searches there. This is the same as your search: https://www.familysearch.org/search/record/results?q.birthLikeDate.from=1890&q.birthLikeDate.to=1890&q.birthLikePlace=Stord%2C%20Hordaland%2C%20Norway
On this one, I used the "Show Exact Search." It has way fewer results: https://www.familysearch.org/search/record/results?count=20&exactSearching=true&q.birthLikeDate.from=1890&q.birthLikeDate.to=1890&q.birthLikePlace=Stord%2C%20Hordaland%2C%20Norway
When I clicked the box next to Exact, it gives fewer results still: https://www.familysearch.org/search/record/results?count=20&q.birthLikeDate.from=1890&q.birthLikeDate.to=1890&q.birthLikePlace=Stord%2C%20Hordaland%2C%20Norway&q.birthLikePlace.exact=on
However, there is no Exact box for the date range.
This article and the related ones at the bottom of the article may help: In Historical Records, the search results don't match my search
The only article about finding in the tree that I am aware of is this one: How can I find a deceased person in Family Tree?
I believe that birth and baptism are treated as the same thing. You will note that there isn't a search for baptism.
Part of the issue may be that you have such a broad search.
0 -
In historical record search results exported to a spreadsheet the first column has the heading "score" and the records are in score sort order. Examination of some spreadsheets may give a sense of the scoring algorithm. But more to the point, the scores are clearly bins, and within the bins sort order is in no particular order. Or rather, is in whatever order exists in the databases.
I have noticed historical record search results tend to be bunched by collections. I think this reflects their order of arrival in the database, nothing more.
Family Tree Find search results are a different matter. Sort order is influenced by when profiles were last edited. Sometimes I work with the same search results over many days. A few days into the process I need to refresh the search and the sort order has changed. All else being equal, the most recently changed profiles come last.
0 -
That's an interesting point. I was thinking that "about" just gives a range of dates and anything within that range is treated equally. I don't know a way to test that theory though.
0 -
Thanks for the comments. What I'm curious about is what affects the score on a search, that is what search terms are given the most weight and how to adjust a search to influence the score sufficiently to get the results desired. I'm wondering if the score on all twenty of the results on the first page of my example is the same and the order is just random beyond that or if there is something about the first three that gives them a little higher score, pushing them to the top of the list.
0 -
Q: You mention that "there were 62 born at Stord in 1890." The Find results for your search indicates many more - so either there are a ton of duplicates - or the results are in error? I have not looked up nor do I know the locality of Stord - but I did notice that there seem to be sub-locations being returned in the results. Is there another geo-level that would limit the search to return the 62 you were searching for?
0 -
Those search result numbers are always misleading and I generally find them pretty useless, both in the FIND and the SEARCH functions. My sample search claims 36,918,352 "results" which is ridiculous. Stord is a a good sized island on the west coast of Norway with a current population of less than 20,000.
Even by the second page of results, place names are playing much less roll in the ranking and seem to be based on Hordaland, Norway, rather than the Stord, Hordaland, Norway criteria. Dates are still influencing the ranking well.
I guess my question is to try to figure out why on page four only about 2 of the 20 results have place names of the form [place], Stord, Hordaland, Norway and the rest are [place], [place], Hordaland, Norway while on page five, 12 of the 20 results are for [place], Stord, Hordaland, Norway. I would expect all of the [place], Stord, Hordaland, Norway results to come before the [place], [place], Hordaland, Norway, which brings me back to my original question: Is their something in the result scoring that makes it obvious why the results are in this order or is this a flaw in the algorithm.
I would expect that all other things being equal that all Stord, Hordaland, Norway results would be listed then all [place], Stord, Hordaland, Norway, results then all [place], [place], Hordaland, Norway results. But that is not what is happening.
0 -
What with the rampant and ongoing mis-standardization of placenames, there are just too many variables there, so I've been trying -- and failing -- to figure out the weighting of name versus date.
I remember that when index corrections first became available, I used my great-grandmother's baptism as a search test, because her surname was mis-indexed. After I corrected it, I entered her name and birthdate (Julianna Heitler, born 1882), and in the old search, I remember her being at or near the top of the list. Now, I have to scroll down six screenfuls to get to her, twelfth on the list -- because her indexed name is spelled Juliána rather than Julianna. This puts her below two 1886 births, seven mothers with the name (with event dates ranging between 1885 and 1906), a Julius in 1886, and a Jule in 1888. It's as if my input of 1882 is essentially being ignored.
I downloaded the spreadsheet and looked at the scores, but could not figure it out: Juliánna Heidler, 1886 and Julius Heitler, 1886 have identical scores of 3.495, Jule Heitler in 1888 scores 3.485, and Juliana Heitler, 1882 scores 3.455 -- the same as three records for Julius Heitler in 1891.
So we have a name off by one letter (Heidler), with a date off by four years, scoring higher than another name off by one letter (Juliana) and an exact match on the year. OK, so surname is weighted differently; let's just stick to the given names. We have names matching the first three (Jule) or four (Julius) letters, with years off by six or four years, respectively, both scoring higher than a name that matches the first six letters as well as the last letter, and matches the year exactly.
And how on Earth can Julius in 1891 have exactly the same score as Juliana in 1882??
I think part of the reason people are having so much trouble with the new interface is that when they changed the default field from "birth" to "any", they must have devalued "birth" in the scoring. That's the only explanation I can think of for a match on half the letters and numbers (Julius 1891) scoring the same as a match on all but one letter (Juliana 1882).
1 -
Very interesting. Are you saying the following URL duplicates your Search Parameters?
She was third in the Results returned when I tried to duplicate your experience. Did you download the spreadsheet today?
Changing the Parameters to input- Juliána - the language localized version of her name - resulted in her being the top result. Perhaps this reflects a user-base expanding to be more international/localized?
Selecting the Search> More Options> Preferences to Language Options: Translated Text and taking out one n - Juliana - results in her being the top result. I hadn't considered this before - but maybe these Language Options are playing more of a role in Search Results now (makes sense - since it is a new feature).
The Results order didn't change for me when I changed FamilySearch default language to Česky.
0 -
No, I used Birth, not Any: https://www.familysearch.org/search/record/results?count=100&q.birthLikeDate.from=1882&q.givenName=Julianna&q.surname=Heitler
Which makes no sense: a birthdate is weighted more heavily when you search for it as "any"??
Speaking as one of a long line of Julias and Juliannas: the usual Hungarian form of the name is Julianna, although there is a variant Juliána (i.e. with a long vowel instead of the long consonant). In baptismal registers and their indexes, it can occur with one or two 'n's and with or without the diacritic, due to various factors like sloppiness, use of a line over the 'n' to indicate doubling, and disagreement over the "correct" Latin form of the name. "My" Juliannas weren't Catholic, so the Latin question is only peripherally relevant: it influenced the writing habits of educated people like pastors and clerks, but the Protestant registers were written in Hungarian.
The diacritic is invisible to the search algorithm: it ignores all extra marks on basic letterforms as if they were mere typographic bling.
I don't think there's anything in the FS databases that identifies the language of an indexed name or of the register that was indexed. If there is anything, it's probably a single label for the entire collection, like it is in indexing: the church books from Slovakia are identified as Slovak, even though only about a quarter of them (if that) are in that language, and the Jewish registers from Hungary are identified as Hungarian, even though the majority of them are in German. I don't think this level of localization can be usefully applied to the search algorithms, and I doubt that any such thing is going on.
0 -
Incidentally, this problem does not just apply to FamilySearch. I recently took out a subscription to Find My Past and it appeared, at first glance, that results had been prioritised according to the exact place name. Later down the list, in the way Gordon illustrates, the results became less specific - say, applying to the county instead of specifically to the actual parish. However, just when I thought the results were becoming practically irrelevant to my search, some really interesting ones appeared again - after about 4 or 5 pages through!
In spite of all the faults of FamilySearch in often producing results in an unexplainable order, I still prefer to doing searches here than at FMP - and certainly far more than at Ancestry, where I really find things hard-going.
1 -
I had some thoughts about why birth might be weighted more heavily when using 'Any' type Life Event. They boil down to the idea that records subsequent to Birth Life Event imply Birth Life Event (fictional records excepted). Any Life Event record could substitute as an implied Birth event. Perhaps this is why the default Date range fields prompts Birth year as an option (default Search)? This also brings up what the 'Any' type Life Event Date Range means (something discussed on another post somewhere).
0