What's the point of asking for search parameters that will be ignored?
Not long ago, I entered a range for year of death as 2012-2014. The very first thing in the results was a 2009 obituary.
Today, I was looking for other records mentioning someone I knew had died in Québec in 1931. To allow for uncertainties in recording, I specified Canada, 1830-1832. Among the 45 items in the results, I got four death records after 1841 in Massachusetts.
What's the point of this? When I say 1830-1832, it's because I KNOW it is in that range. Scrolling past four useless hits is no big deal, but it's rather discouraging when the number of results is in the thousands, and I know from experience that the most likely are NOT at the top of the list. Same when I KNOW the Groleaus lived in Canada and every search in that branch has Gurolla in Mexico on the first page.
Answers
-
I find FamilySearch handles the issue of getting unwanted results from a specific search better than other websites I use.
To narrow down results from the thousands sometimes produced to the handful one would hope for, you need to use wildcards, "Exact" searches, filtering on specific types of event and/or filter on individual collections.
I have yet to see an example posted here that, in spite of still containing a limited number of results that appear to fall outside of the search parameters, cannot be reduced to include a very limited number of (mostly relevant) results. There are many complaints along the same lines as yours, but people seem very reluctant to illustrate the precise problem (e.g. with URLs linking to the Search page in question).
Please post some examples here, as I'm sure other users will be able to advise on strategies that would enable you toould cut down on all those unwanted items that are popping up..
2 -
I can generally narrow things down, but still: today I specified a known death date of 2003 and a 1934-1936 birth in Michigan and the FIRST item in the results was a 2005 obituary. The second was a Michigan naturalization in 1940. Neither of those could possibly be the person I was looking for, so why offer them unless there is nothing else. And in 559 items, it's pretty hard to believe one that couldn't possibly match might be the closest "maybe."
0 -
As suggested previously, you really need to provide a clear example of your search inputs to illustrate your problem here. If you don't want to provide details of the individual on which you are making the search I'm sure you could replicate the issue using another name, though probably different dates.
However, from what you are telling us above, showing both a year of birth range (especially when specifying a placename - Michigan) will not help in finding a record for the 2003 death - if indeed that is what is the subject of your search. Remember, to find either the death, or birth, record being sought the indexed record would have to include both birth and death details.
For example, if I am searching for a 2003 death record of a relative, I would only provide the assumed birth date range, as the death record would be very unlikely to include an indexed birthplace. In some cases, it might prove better to omit birth details completely, as the age shown in the death record could be totally wrong.
If searching for a birth record, make sure a date of death is not included in the search, as this will cause confusion as to what type of record you are searching for. If I search via the FamilySearch logo link on the Person (Details) page, I find all inputs are carried across to the Results page. This is not usually helpful, so I immediately delete marriage detail (unless that is what I am looking for, of course) and narrow my search criteria so only one type of record at a time is included in my search criteria. Again, as previously suggested, the use of filters, "Exact" and wildcard searches can cut out most of the unwanted items that are otherwise produced on the Results page(s).
I find the issue more difficult to deal with when using Ancestry, though Find My Past is probably a bit easier to work with. So, just experiment with inputting less detail to the Search page, though in some cases you will have to add more! Your example gives a hint as to where you might be going wrong, but there is no substitute to providing an actual example: (1) to provide evidence that proves your point, or (2) in order to gain help from experienced users on alternative search strategies that will help in reducing (if not completely eliminating) the problem you are having.
0 -
The point is the search routine is trying to be as helpful as possible despite the fact your search has absolutely no good results. If the first few results don't appear to have any connection with your search, there really is no use going beyond the first page of results. Things won't get any better.
If you post the URL of a confusing looking search, then people can help explain why that search gave the results it did so you can learn how to best use the search. For example, this search: https://www.familysearch.org/search/record/results?count=20&q.deathLikeDate.from=1830&q.deathLikeDate.to=1832&q.deathLikePlace=canada&q.surname=Groleaus
which is for last name Groleaus, died 1830 to 1832 (or buried), in Canada gives these first few results:
Looking at these, the last name apparently gets a couple of point for starting in GR, the place gets points because they are in Canada. There is no death or burial date on the record so that is ignored. I didn't search using birth information so that is ignored.
Yes, these are bad matches. If you export them, you can see the Match Score which will be in a 0.0 to 5.0 range. The higher the score, the better the match. The Hints routine requires a very high score to display a hint. The search routine has a very low bar. These first three results all have a score of 1.6760801 which is pretty rotten. The 71st result has a score of 1.0775801. The question is why does it get any points at all when apparently the only possible points are from the last name starting with Gra?
Since the results are sorted by match score, the best matches are on the first page. A necessary skill in searching is to learn how to set the search parameters and filters to get what you want at the top of the page. There must be some value in these poor results because this is pretty much the style of all search engines these days. Google almost always gives over a million if not a hundred million results. When was the last time you bothered to look at page 85,123 of a Google search?
But again, as others have requested, if you post the URL of a search result that looks goofy, we could all examine what is going on and give suggestions on how to get better results.
As a side note, I do wish someone from FamilySearch on the search algorithm team would give a presentation at RootsTech on exactly how the search engine weighs and uses each search parameter to calculate that score. I have to say the logic behind it looks quite abstruse at times.
1 -
I apologise. I believe the programmers must have changed the algorithm, as I have not experienced this behaviour previously. In a simple search, using "Exact" only matches, I looked for instances of a John Smith born in Michigan in the 1930-1934 period. The first 50 (out of 80) results were matches (albeit most were from census records), but the remainder were for individuals for which there was either no birth detail attached, or they were born outside the 1930-34 period. See https://www.familysearch.org/search/record/results?count=20&offset=40&q.birthLikeDate.from=1930&q.birthLikeDate.to=1934&q.birthLikePlace=michigan%2A&q.birthLikePlace.exact=on&q.givenName=john&q.givenName.exact=on&q.surname=smith&q.surname.exact=on.
Without making inputs relating to death and marriage, I have been used to finding all but the last two or three results matched my expectations. There has definitely been some change made that is causing results to be prioritised (even if "Exact" boxes are checked), instead of those not complying to my inputs not being included.
There have been a lot of responses from moderators lately saying something like: "Your problem has been passed to the relevant section / engineers for attention." Perhaps a moderator would care to take such action regarding this issue - so at least it might be brought to the engineers / programmers attention, even though I doubt we will be provided with the courtesy of a response from somebody in a position to advise!
0 -
Another consideration is that sometimes, there aren't any good matches because the search algorithm doesn't play well with (FS's own) indexing instructions about not assuming surnames. For example, if the indexers followed their instructions and indexed "baptisavit Joannes filius Georgii Smith" as child = Joannes (no surname) and father = Georgii Smith, then searching for Joannes Smith will not turn up this baptism, come heck or high water. You have to leave the main surname field blank in order for it to be a match -- which of course makes the search much more likely to turn petulant on you ("Consider adding a last name to avoid this message").
0 -
I don't mind as much including "almost" items further down in the list. Or including a record that doesn't have a birth date as a possible match for the birth date range I have stated. But when I specify death between 1932 & 1938, putting a birth date of 1965 at the beginning of the list is irritating.
0 -
Just take that really poor first match as a signal that you have no matches, that is, there are no records in the database for your search. If you find that you have a great match farther down the list, I would be really interested in seeing that example to try and refine my personal model of how things must be working.
I would assume that record with the 1965 birth date is included because the search routine is completely ignoring any empty search fields. It is definitely not sitting there doing math such as "birth date is greater than the requested death date therefor eliminate record." That is one of the reasons the Hints engine acts so different from the Search engine and really gives much better results. Not only does the Hints engine look at all vital information on a person, it also looks at all family relationships.
0 -
But if I say birth 1932-1938, that is not an empty field. And the offered record also does not have birth empty, but 1965. Of course, it's possible that the person has no record of birth, but that's rare in twentieth century USA. So I tend to look through the first page, maybe the second. Can never coax myself to look through all the pages when there are dozens even after specifying exact surname. Which I hate to do, because even in twentieth century USA, not only is handwriting AND spelling often terrible, but some indexers don't seem to be very careful.
In any case, if there is a record with no birth date, I would think it should be higher in the list than one with a birth date out of the specified range. That's not the sort of math in your example; it's a simple inequality without which, there's no point in even having an input field.
This "hints engine" sounds like it is worth looking at. Where do I find it?
0 -
The second hit, Francis R Williamson, with a matching birth date, is an obvious better match than the first item, with no birth date, first names out of order, and a different surname. At least it was on the first page this time.
0 -
The "hints engine" that Gordon's talking about is just whatever algorithm it is that generates the "Record Hints" on Family Tree profiles. (It is a lot stricter than Search - Records, but nevertheless gets stuff wrong All The Time. It can't tell apart the two generations with identical names that I have about five generations up, it loves to suggest people with the same names but from the wrong place and with the wrong religion, and it hasn't a clue which single-letter differences Make All The Difference and which ones are Exactly The Same Thing.)
It'd take some study using the spreadsheet-downloading function to figure out the exact details, but my understanding is that the search fields are all combined with a logical "and", not "or": each match increases the score of the record. Non-matches do not decrease the score; they just don't add to it. My impression is that matches on people's names -- or parts of names -- are worth more points than matches on dates or places, but having names in the same order is given hardly any weight at all. In your Francis Robert Williamson example, none of the first 100 results have "Francis Robert" in that order with everything spelled out, but the record that it put first has "Robert Francis", all spelled out in full. It is therefore considered a better match to the given name inputs than the abbreviated "Francis R". (Why the full + partial match of "Robert Francis Williams" is given greater weight than the partial + full match of "Francis R Williamson", I haven't a clue. More study of spreadsheets needed.)
Regarding "Of course, it's possible that the person has no record of birth, but that's rare in twentieth century USA": you're forgetting that it's not enough for the record to exist. Not all records are online; of the ones that are online, only a small fraction are indexed; and the ones that are indexed are not necessarily on FamilySearch. You need every single one of those things to be true in order for a record to show up in a search on FS.
0 -
Thanks for the example search. Here is the exported version of the first page of results (You'll probably have to open it in a new screen to actually read anything).
It is kind of instructive. First off, I was wrong about the scale. I guess I don't know what the top of the scale here is, but here the first result has a score of 6.28268 and the second has a score of 6.0317802.
The second hit, Francis R Williamson, with a matching birth date, is an obvious better match than the first item, with no birth date, first names out of order, and a different surname. At least it was on the first page this time.
This depends on your definition of "obvious." The first result has both Francis and Robert without caring about the order since exact is not checked. That is going to give more name points than Francis and an unknown name starting with R. As far as the last name, you did not do an exact search so the the routine will be treating all cognate names the same. Since Williams and Williamson are the same name they probably got the same number of last name points. Residence is equal in both. That just leaves the problem of dates. Those often seem to be treated in some arcane way that doesn't seem to make sense here unless they are so close that the points for dates is very close. That is why I want the RootTech lecture! I would have to guess that the first name point difference here is so large that that is what is making the difference.
One way to test things out is to modify your search by adding in exact searches.
First off, if I check the box next to Williamson to require that to be exact, ( https://www.familysearch.org/search/record/results?count=20&q.birthLikeDate.from=1929&q.birthLikeDate.to=1931&q.fatherGivenName=Arthur&q.fatherSurname=Williamson&q.givenName=Francis%20Robert&q.motherGivenName=Elvie&q.motherSurname=Gustason&q.residenceDate.from=1930&q.residenceDate.to=2017&q.residencePlace=Michigan&q.surname=Williamson%20&q.surname.exact=on ) then the first record disappears and the second record climbs to the top. Note that all the records now have the last name of Williamson instead of the mix of Williamson and William in your results.
Second, I'll uncheck the exact against Williamson and check it next to Francis Robert. ( https://www.familysearch.org/search/record/results?count=20&q.birthLikeDate.from=1929&q.birthLikeDate.to=1931&q.fatherGivenName=Arthur&q.fatherSurname=Williamson&q.givenName=Francis%20Robert&q.givenName.exact=on&q.motherGivenName=Elvie&q.motherSurname=Gustason&q.residenceDate.from=1930&q.residenceDate.to=2017&q.residencePlace=Michigan&q.surname=Williamson%20 )
Now there are no results because there is no one with that first name at all that fits this search.
Changing the first name to Robert Francis with an exact search ( https://www.familysearch.org/search/record/results?count=20&q.birthLikeDate.from=1929&q.birthLikeDate.to=1931&q.fatherGivenName=Arthur&q.fatherSurname=Williamson&q.givenName=Robert%20Francis&q.givenName.exact=on&q.motherGivenName=Elvie&q.motherSurname=Gustason&q.residenceDate.from=1930&q.residenceDate.to=2017&q.residencePlace=Michigan&q.surname=Williamson%20 ) gives just one result, that same record that was viewed as the original first match.
So looking again at the first two results, we probably have a wide point spread on the first name, equal points on the last name, narrow range on the date, and equal points on the place. You can go through the rest of the results and see how the configuration of the results affects the score. So the results do generally make sense. They only don't make sense if you few the search as a simple character for character text matching process but historical records are so complex that would never work.
0 -
"Williams" is not the same name as "williamson." If they are only looking at Miracode, that is another mistake. Miracode is useful for some languages but even then it is not "all there is." It doesn't work well for French, which often has silent letters. Many of my French Canadian relatives have Grosleau, Groslot, Groleaux, Grolo, and Grolot for the same person in different records. One marriage record has two of those in the same record.
0 -
I don't know what the current process is that FamilySearch uses to determine if names are equivalent or not. Way back in the days of New Family Search, there was actually a searchable database the public could access that contained long lists of names that were treated as the same name in searches. I found it quite useful and wish that if they still use it that the database was still available.
In that database you could type in a first name or last name, such as Christopher, and see all the names that the search engine would treat as the same name, such as seeing, Christopher, Christofer, Christoffer, Kristopher, Kristoffer, Kristofer, Xistofer, Xistoffer, Xistopher, Cristoffer, Cristofer, Cristopher. And the entries for a name were not just limited to spelling variants. I don't remember many of these, but as one example, Johannes and Hans were on the same list. For some names there were be many dozen names in multiple languages.
That the search engine is still doing something similar and not using Miracode is confirmed by looking at your list of results and seeing that not only are Williams and Williamson being treated as the same name, but there is also a McWilliams in the list of results.
0 -
One thing we've not had a chance to discuss is how to search efficiently depending on your goal. In the search you posted above, what was the goal you had? Not many types of records sets contain Residence as a field. Census records do and that Public Records database (which is a pretty unreliable database with a lot of junk in it) that most of that first page of results come from does. But there are not many others.
Also, in a lot of records, I have no idea what percentage, full names are not used.
So unless you are specifically looking for census records I probably would not uses Residence as a parameter. And I would do two different first name searches, one using Fran* and one using Rob*. Also, in all these searches if you are absolutely sure that no clerk anywhere at any time would have spelled the last name other than Williamson, then I would also suggest clicking the exact search box next to the last name.
1 -
The residence parameter makes it easier to get a specific census record. Also, it sometimes suggests a date of birth and in the past (apparently this feature was removed) it would give clues to the people living with. And it is another way to narrow done the list. Or should be, but it is apparently another parameter that is ignored (saying Michigan doesn't stop lots of Europe being high on the list).
Doing one name at a time is an approach I sometimes use. But I didn't know it allowed a wildcard.
The idea of a name equivalence list is good, but it never matched Groslot to Groleau. Or Frączek to Fronczek or Nędza to Nedza/Nendza. And while it is good to know that Miracode isn't the exclusive method, the method that matches not-even-close Mexican names to Groleau certainlyhas room for improvement.
Full names do match to initials, but (I think) not if 'exact' is ticked. This is useful, but it does put many more false positives on the list. And since the priority criteria are less than ideal, …
There's a lot of useful info in this thread, but I still retain my original complaint that details specifically requested are being ignored. To which I'll add the complaint that the priority criteria are less than ideal.
0 -
I still retain my original complaint that details specifically requested are being ignored. To which I'll add the complaint that the priority criteria are less than ideal.
There will always be room for improvement. And I will say that I find that all these modern search routine that copy Google in throwing in everything but the kitchen sink are trying to be way to helpful. It would certainly not break my heart if year ranges were strictly enforced. Oh, and one thing I forgot to mention is that if search results have exactly the same score, they appear to be presented in alphabetical order. That can be another reason why the hoped for results is last in a list of half a dozen similar but not as good matches.
A couple of years ago the search routine had a feature that I really miss. In the results list, after the first couple of dozen results or less, there would be a blue bar and a notification that read something like this, "The follow results do not match your search criteria but may be of interest." I wish they would bring that back. For your search, that blue bar may have shown up above the first entry.
I hope you got a bit of useful information in how to deal with the current search routine while we wait for further improvements in it.
1 -
I remember that blue bar. Didn't really register that it has not been there lately. Agree about honoring the year limits. I am quite capable of widening them if I think it will help. I don't need some programmer to automatically widen them for me.
I was a software engineer for over three decades, so I understand that users sometimes don't know what they're doing. But I also know well that trying to predict and hard-code what they might think before they do it will only make things worse.
0