Date weighting in Family Tree Search Function
Why is there little or no weight given to the date or date range in the Family Tree search function when "exact" can be specified for names and locations? I have attached a screen shot of the search for a Johann Heinrich Horstmeier, with a date range of 1750 to 1755. The first three names have birth dates from 1780 to 1802, even though one can see the father of one of those with a birth date of 1752 that is not returned.
Answers
-
My experience has been that the date range works quite well, if there is a person in Family Tree whose information actually matches.
I replicated your search here: https://www.familysearch.org/search/tree/results?count=20&q.birthLikeDate.from=1750&q.birthLikeDate.to=1755&q.birthLikePlace=Hille%2C%20Kreis%20Minden%2C%20Westphalia%2C%20Prussia%2C%20Germany&q.birthLikePlace.exact=on&q.givenName=Johann%20heinrich&q.surname=horstmeier
I think there are two issue with that father born in 1752. One is that there is no birth place on his record at all so he will never come up in an exact place name search. The other is that it appears that the search routine does not view Johann and Johan to be the same name.
Take off that second N and uncheck exact place and you get very different results: https://www.familysearch.org/search/tree/results?count=20&q.birthLikeDate.from=1750&q.birthLikeDate.to=1755&q.birthLikePlace=Hille%2C%20Kreis%20Minden%2C%20Westphalia%2C%20Prussia%2C%20Germany&q.givenName=Johan%20heinrich&q.surname=horstmeier
3 -
I thought about the fact that I had checked exact place, so I replicated the search with exact place unchecked but got the same result. While the name difference of one letter may have an effect (although I did not specify exact name), I posted this question using this one example after experiencing this apparent lack of attention to the date after many, many searches where the date information was for the most part, essentially ignored.
The question relates to the issue that the dates of 1780, 1789 and 1802 (and other dates that do not show up in the screen shot) do not match the date range of 1750 to 1755.
1 -
@Gerry Mitchell pointed out that
... The question relates to the issue that the dates of 1780, 1789 and 1802 (and other dates that do not show up in the screen shot) do not match the date range of 1750 to 1755. ...
Were we not once told that the FamilySearch enquiry screen essentially ignores date ranges? Probably in a previous community platform? The phrase "essentially ignores" may not be 100% true but is it more true than not?
Like many things in FS searches, everything works fine when there is an exact match, as @Gordon Collett suggests but if there isn't, I think that FS follows the principle of "more is more", and doesn't even seem to prioritise by date.
1 -
If indeed the search function does not prioritize dates, why is it included as a search variable?
In my opinion, while names can have various spellings and even be completely different over time, locations and dates are hard facts that would seem to me to be more relevant search variables.
Perhaps my question could be converted to a suggestion to increase the priority given to dates in the search function.
0 -
I agree the FIND function is just strange sometimes. It works great when it works and when it doesn't, it's just not clear why not.
Here is an example of where the date criteria is clearly used and used well.
The first dozens and dozens of results are all in the birth date range of 1894 to 1894 except for two that have "about 1895" and some 1893 births that have christenings in 1894. Those exceptions are understandable.
However, when the routine runs out of exact name matches, which I asked for, instead of just stopping, it starts throwing in some close but rather random place names and the dates start jumping around.
I think it really comes down to the computer being too helpful and the search routine trying to give us what it thinks we might really want. Others here have described the problem with the FIND routine as due to the fact that all the criteria have an OR function when you think they should have an AND function. Really looking at your initial search and applying strict AND logic, you should have gotten only a couple of results because there is no one on the list of that name AND that exact place AND a birth date between 1750 and 1755 besides one with an about 1752 birth date and one with a longer name.
0 -
@Gordon Collett suggested:
" ... Others here have described the problem with the FIND routine as due to the fact that all the criteria have an OR function when you think they should have an AND function ... "
Yet that hypothesis doesn't work when I apply it to the query that you created, Gordon. If it were correct, then we should see entries in the rest of the world with births in 1894-1894. I didn't. Then again, I didn't check all 20,000 results. (Shock, horror...) It might work with other queries but not this one, I think.
What I found interesting was that in the Search Results, I saw various birthplaces that weren't a match to Stord, Hordaland, Norway. But, looking more closely...
K2XR-CLX, Knud Knudsson Laukhammar, has a birthplace of Laukhammar, Tysnes, Hordaland, Norway - which doesn't match the search criteria. But he does have a Christening place of Stord Kirke, Stord, Hordaland, Norway - which does match the "birth" place of Stord, Hordaland, Norway. As if, in this case, the search criteria of birthplace is being applied to both the birth or the christening place. Which seems quite reasonable to me.
K8H6-M1F, Hansine Langsted, is similar - she doesn't match on the birthplace but does match on her christening.
In fact, I went through a number of the 20,000 and all that I checked matched on either birthplace or christening place - they weren't just similar, they matched (allowing ABCD, Stord, Hordaland, Norway to match Stord, Hordaland, Norway).
This match on either birthplace or christening place isn't obvious in the Search Results because they only show the birthplace. But as I said, it does seem a reasonable thing to do.
However, ignoring the date range is stupid and patronising. ("No, we know what you really want...") Just what is someone supposed to do with 20,000 results?
1 -
@Adrian Bruce1 stated: "However, ignoring the date range is stupid and patronising. ("No, we know what you really want...") Just what is someone supposed to do with 20,000 results?"
I agree. If there are no real results, just tell me.
2 -
I tried a much rarer placename on the query https://www.familysearch.org/search/tree/results?count=100&q.birthLikeDate.from=1894&q.birthLikeDate.to=1894&q.birthLikePlace=Higher%20Wych%2C%20Cheshire%2C%20England%2C%20United%20Kingdom&q.birthLikePlace.exact=on
Higher Wych is where my GG-GF comes from and is just a few houses on the border between England & Wales. There are 60 results for people born at Higher Wych, Cheshire, England, United Kingdom (exactly) in the range 1894-1894. All that I could see were born or baptised there. Most bore no relation to the birth date range of 1894-1894. There were no extraneous results from (say) Norway in the range 1894-1894. Therefore the idea that it's looking for birthplace or birth year range doesn't work in this case.
Whether it might if I take the "exact" marker off, I don't know - that gives too many results to be meaningful.
Just to be clear, I am trying to satisfy my personal need to understand whether the search is "place AND date" or whether it's "place OR date". If you're not that interested in knowing, well, I can't blame you!
1 -
I think part of the problem is that it's weighting names much higher and dates much lower than we expect or want. If I search for Johann Alexander born in 1894 to 1894, then the Alexander born in 1894 will come below the three dozen Johann Alexanders born in other centuries.
I also think that the logical AND versus OR idea applies more to the different field groups (life events, family members, etc.) than to the individual fields within a group. That is, it's not so much "birthplace OR birthdate" as "birth OR spouse OR alternate name". It weights the birthplace much too heavily to return anything from a different part of the world (barring autostandardized nonsense), but it will cheerfully return dozens of stillborn infants despite the lack of a match to a specified spouse.
The other problem, of course, is the unpredictability of what text matches what. Nyiri doesn't match Nyiry, Debreczeni doesn't match Debreceni, Johann doesn't match Johan (!) -- but Dedinszky for some unfathomable reason matches Edding. It "knows" that Lajos is Louis and Ferenc is Franciscus -- but Ferencz (which is Exactly The Same Thing as Ferenc) it erroneously matches to Ferdinand, and it never, ever returns a Ferenc for Ferencz or vice versa. (Have I mentioned that they're Exactly The Same Thing?)
3 -
@Julia Szent-Györgyi - I think you may be interpreting the OR idea better than I have, with the suggestion that the logic is Block OR Block rather than OR within a Block.
I have considerable difficulty in understanding why the Blocks should be ORd together, effectively. Why, if I describe an ancestor with their birthplace AND spouse, would I want a search to find people with matching birthplace but totally different spouse? (The result of an OR). Makes zero sense to me. Is it even intentional or has the weighting element just got out of hand?
As far as matching Names goes, I think that it was possible to purchase packages that matched names. If FS really have bought one in, it might explain why it's so poor at matching names outside the culture of the authors.
2 -
I wish there was a help center article that described in at least general terms how search result scores are calculated and what determines the order in the results display when scores are identical.
The scores can be seen if you export the search results into a database. If someone had a bunch of free time and was obsessive enough about the question of search results, that person could probably run a hundred searches with minor variations, export each into a separate database, analyze the result scores (which really are all that matter when evaluating results of a search), and reverse engineer the search routine. But that won't be me.
I also wish they had kept that old blue divider in the search result list that was labeled "The following results do not match your search criteria but they might be of interest."
"Why, if I describe an ancestor with their birthplace AND spouse, would I want a search to find people with matching birthplace but totally different spouse?" This is an example of overly helpful computer syndrome in which the program is basically saying, "I know you were looking for a different spouse, but are you sure the person you are searching for wasn't married a second time?"
1 -
Here is an example spreadsheet of the search I did with just date range and exact place. It's interesting see that the score, which I think ranges from 0 to 5.0000, is actually quite low for all these first twenty results. It's only 1.5. I would have interpreted that number to mean that none of these are very good matches even though they are all perfect matches. I was expecting these to all have a score of 5.0. There is some strange math going on here.
I haven't looked at one of these for a while. Never noticed before that the main name and all alternate names are included in the results.
1 -
Ok, so I'm a little bit obsessive. Here I added that a first name should be a* to my sample search. It's interesting to see how that affected the scores and that apparently date has a bit more influence than place:
(In case you are wondering, if someone had a surname, often their patronymic is put in the first name field since there is not a middle name field. That is why Andersdatter is treated as a first name in these results)
0 -
'Just what is someone supposed to do with 20,000 results?' - especially with no sort capability provided at all.
The export capability helps (and can be automated) - though it is limited to 5,000 results, forcing careful tuning of the search criteria, at least once you've got the data and put it in a spreadsheet or database you can analyse it however you like.
Example for automated export of 100 records from offset 3800 to a TSV file (in this case, it involves a simple search for surname=Tinkham, which actually has over 8,000 results): https://familysearch.org/search/webservice/treeresults/download?count=100&offset=3800&q.surname=tinkham&fileType=tsv
(I haven't been able to locate any FS documentation for this, but I found it via Chrome Developer Tools.)
0 -
Here is one last over simplified example for analysis. Now I added a last name of a*
The four best and equal scores all have a first name starting with A and a last name starting with A, fall in the correct date range, and have the exact place name match on a parent place for the full place name.
The second best score has a first name starting with A and a last name starting with A, is in the correct date range, but the "exact" place name only matches on the county.
Then the rest of the initial 20 results all have the same score. All the names match fine. The one example that looks off, the one third from the bottom, has Agdestein as part of his first name in one of his alternate names and Agdestein as one of his last names in some of his alternate names. So the routine throws all first names in all names together in one basket and all last names in all names in another when evaluating a search. Also, all the place names match exactly. But now the dates are all over the place. Even so, all the search scores are higher than the scores when no last name was included.
I wonder how high I can push the score? If I take the simplest name above and use it with exact matching and increase the levels of place name to include Leirvik, I get this:
where, strangely enough, her score is actually less than in the previous search.
OK, this is getting ridiculous. Taking off the exact match check boxes for her first and last name gives more matches.
But the additional three matches are all outside the date range. However, not doing an exact search on the names pushes Anna Marie's score from 5.1 to 6.9054003.
Taking off the exact place name check mark does not increase her score any further, but it increases the number of results from 4 to 116, 222.
0 -
@Gordon Collett said (quoting me at the start)...
"Why, if I describe an ancestor with their birthplace AND spouse, would I want a search to find people with matching birthplace but totally different spouse?" This is an example of overly helpful computer syndrome in which the program is basically saying, "I know you were looking for a different spouse, but are you sure the person you are searching for wasn't married a second time?"
Just to make explicit my objections to that mythical personification of the software in case it helps people to understand something or other...
Firstly I'm not claiming that the software ever does exactly that. But it certainly does similar things. Suppose I'm searching English & Welsh census collections for an ancestor with their birthplace, approximate birthdate and spouse. Why have I added in the spouse? Almost certainly because I'm looking for something like John Smith from London and if I don't have that spouse in the search criteria, I've got no chance of recognising him. So if the software gives me John Smith with a different spouse then, even if it really is the same John Smith, I can't recognise him so it's a wasted suggestion.
I also wish they had kept that old blue divider in the search result list that was labelled "The following results do not match your search criteria but they might be of interest."
Absolutely. It could separate answers (such as John Smith with the desired spouse) from hints (such as John Smith with a different spouse).
And that's an important point - hints are excellent in the right place. But they are different from answers to a query - they should not be confused. Sometimes it feels like FS query outputs are confusing the two.
4