Home› Welcome to the FamilySearch Community!› Suggest an Idea

Filter Last Names

dontiknowyou
dontiknowyou ✭✭✭✭✭
October 8, 2021 edited October 8, 2021 in Suggest an Idea

I love how search results filters have been moved forward in the new Search interface. The search bar on the filter menu is a very nice enhancement. Please add a filter: Filter by Last Names similar in concept to Filter by Collection.

This filter would produce an extremely useful list of name permutations: stems, soundalikes, lookalikes. So often brick walls are due to spelling changes.

Here is a use case. Inwinkelried is a very rare surname with just 32 exact matches in FS historical records, but 648,436 non-exact matches. How to find Inwinkelried and variant needles in a haystack this large? Doing this by hand is not practical.

Inwinkelried 32

Imwinkelreid 20

Winkelried 371

Winkelreid 57

Winkler 451,796

Winklereid 7

Winkleried 9

. . .

Couldn't this be implemented easily using the filter functionality?

[Edited to remove a tangent.]

Screen Shot 2021-10-08 at 10.59.26 AM.png


Tagged:
  • New
  • Record Searching
3
3
Up Down
3 votes

New · Last Updated October 8, 2021

Comments

  • Julia Szent-Györgyi
    Julia Szent-Györgyi ✭✭✭✭✭
    October 8, 2021

    The biggest problem with this idea is that different languages have completely different patterns, and FS has no way of knowing what language it should be "thinking" in for any particular search. As far as I know, the search algorithm for surnames uses a variation on Soundex, which is optimized for the American melting pot, but often fails miserably on said pot's less-common ingredients.

    Also, are you aware of wildcards? (Asterisk for any number of characters, including none, and question mark for a single character.)

    1
  • dontiknowyou
    dontiknowyou ✭✭✭✭✭
    October 8, 2021

    How FS does matching, their version of a Soundex, is a rabbit hole we do not need to go down right now. Often, I find it much more immediately important to get a list of names in the results.

    Yes, I have used wildcards here since I first arrived. This idea is in addition to phonetic matching and wildcards or (dreaming) regular expressions.

    This idea is that, given record search results, the user could see a list of names and count of records matching each name. Like the Collections filter, but for names.

    0
  • LDS Search Test
    LDS Search Test ✭✭✭
    October 8, 2021 edited October 8, 2021

    Hi, sorry, me again..

    How about this ?

    Inwinkelried Search.jpg

    See : https://community.familysearch.org/en/discussion/104666/new-search-tips-and-tricks#latest

    @dontiknowyou  It finds lots of Imwinkelried, and also some Juwinkelried ! Is that a real name or did somebody misread 'In' as 'Ju' ?

    2
  • LDS Search Test
    LDS Search Test ✭✭✭
    October 8, 2021 edited October 8, 2021

    Might need to work on the matching criterion ....

    Inwinkelried corruption.jpg


    1
  • LDS Search Test
    LDS Search Test ✭✭✭
    October 8, 2021

    On a connected note. I was surprised to discover that when I set the preferences in the new search to Data Sheet format, then customize it to remove all additional data, leaving only the person and their basic vitals in the results, If I then choose to export the data, the entire data is exported, not the reduced set.

    I had presumed that the export would be what I had customized.

    1
  • dontiknowyou
    dontiknowyou ✭✭✭✭✭
    October 8, 2021

    Questions about spelling of surnames are a big part of any surname study. Are the spelling differences variants (evolution) or deviants (phonetic spelling, transcription errors, typos)? Building trees is one way to find out. Building trees in FT tackles most deviants without any extra effort by the researcher, leveraging consilience in the surrounding tree.

    That's why I want a list of surnames in a set of search results.

    0
  • genthusiast
    genthusiast ✭✭✭✭✭
    October 10, 2021

    I like this idea better than my (negate) wildcard idea. If filtering could be done - and surely a computer can parse and match like names - so that would seem to imply filters would be available - that would be more powerful than the negate idea. (upvote)

    FS has no way of knowing what language it should be "thinking" in for any particular search. 

    FamilySearch website does have a language setting (bottom of FamilySearch pages) next to COOKIES PREFERENCES. It does know what language the user would like to 'read'.

     I was surprised to discover that when I set the preferences in the new search to Data Sheet format, then customize it to remove all additional data, leaving only the person and their basic vitals in the results, If I then choose to export the data, the entire data is exported, not the reduced set.

    Export of search results to spreadsheet does export everything. You can hide the columns you don't wish to see.

    Building trees is one way to find out. Building trees in FT tackles most deviants without any extra effort by the researcher, leveraging consilience in the surrounding tree.

    Are you saying you like to add trees as Unconnected persons?

    1
  • LDS Search Test
    LDS Search Test ✭✭✭
    October 10, 2021

    I'm convinced that it does need to be a filter within FS. Because, although it's true that you could achieve the same result by exporting the data and then processing it at the user's end, the export would have to be huge to ensure that it included all possible variations.

    The only alternative I can envisage might be to allow a simple form of macro scripting to enable the user to "Extract Data" limited to a predefined set.

    This might also provide what another user has asked for, sorting results.

    Thinking as I'm typing, though, it could be implemented quite easily as a fixed set of routines at the FS end, provided within the preferences section.

    By commandeering the Preferences - Format - Data Sheet - Customize Data Sheet function as a basis, and using it as a similar option to Export Search Results. This would allow the user to select only the data required (with an additional option to use Full Name or to separate First Names and Surname).

    I think that the capability to choose "Sort results A-Z" needs to be in the Format section though. The user shouldn't have to export results just to see them sorted alphabetically.

    2
  • dontiknowyou
    dontiknowyou ✭✭✭✭✭
    October 10, 2021 edited October 10, 2021

    I'm convinced that it does need to be a filter within FS.

    Then please use the Upvote button on the opening post.


    About phonetic matching. Whatever FS is using, it isn't the original Soundex (wikipedia). Phonetic indexing algorithms (wikipedia) are a hot research topic; commercial applications are growing rapidly. https://forebears.io has its own algorithm (a trade secret?) and returns lists of names (see screenshot). I use this site to plan searches on FS. But forebears.io lists are only an approximate solution, because they are not built on FS historical records.

    Screen Shot 2021-10-10 at 7.59.04 AM.png

    Steve Morse is a developer of phonetic indexing algorithms that takes into account language origins. His https://stevemorse.org demonstration of the algorithm on Ellis Island and other immigrant passenger databases is a goldmine of names and mis-spellings for genealogy and surname research.

    Returning to FS and the Search interface. Regardless of the algorithm that FS uses in Search, Find, and Record Hints, many users of Search still need a list of all names returned in search results.

    0
  • LDS Search Test
    LDS Search Test ✭✭✭
    October 10, 2021 edited October 10, 2021

    Ok it sparked my interest, so I thought I'd see what I could do with what's currently available.

    What I did - Set search to display 100 per page. In More Options, Preferences, Format Select Data Sheet, Customize data sheet, and then turn off all additional data.

    Search for surname *??winke?rie?

    Now instead of exporting results, I used the mouse to select everything from Name, to the end of the last person record, then copied and pasted into a text file. I ran this file through the unix stream editor 'sed' with a rough and ready match to pull out all the surnames and then passed them through sort and another unix tool 'uniq' to get the unique results.

    There were 14 variants matching the wildcard criterion above in the first 100 search results. I then copied content from the remaining 3 pages of results into the same text file, and did the same again. Here's the result..

    Boero-Imwinkelried

    Boeroimwinkelried

    Fenwinkebried

    Fenwinkelried

    Finwinkelried

    Imwinkebried

    Imwinkebrier

    Imwinkeiried

    Imwinkelried

    Inewinkelried

    Inwinkelried

    Irwinkelried

    Ivowinkelried

    Junwinkelried

    Juwinkebried

    Juwinkelried

    Smwinkelried

    Surwinkelried

    Tenwinkelried

    Timwinkelried

    Tinwinkelried

    Truwinkebriel

    Ynswinkelried

    Zuwinkelried

    These three managed to avoid detection until I started looking

    Bmwinkelried

    Inwinkelrie

    Imwinkeirted

    2
  • dontiknowyou
    dontiknowyou ✭✭✭✭✭
    October 11, 2021

    Basically what I do, but even with regular expressions, grep, sed, awk, uniq, vi search and replace, even perl scripts, it is still a tedious chore. Which is exactly why extracting a list of names needs to be a tool built in, so everyone can use it.

    Here is one of my lists of variants and deviants (Guild of One Name Studies jargon):

    Amhof

    Amhuf

    Amhuff

    Earhuff

    Einhoff

    Einhuff

    Emhof

    Emhoff

    Emhoof

    Emhough

    Emhuf

    Emhuff

    Emkoff

    Emoff

    Emtruff

    Enchoff

    Enhoff

    Enhuff

    Erhuff

    Ernhoff

    Ernhuff

    Eruhuff

    Euhuff

    Eunphuff

    Finhoff

    Hemenhoff

    Hemhoff

    Hemhuff

    Hemmhoff

    Heunnhoff

    Humoff

    Imhaf

    Imhoaf

    Imhof

    Imhoff

    Imhoof

    Imhooff

    Im Hooff

    Imhuf

    Immhof

    Immhoff

    Immhooff

    Inhofe

    Inhoff

    Inhoof

    Iruhoff

    Iuhoff

    Iuhuff

    Omhoff

    Omhoof

    Omhuff

    Umhoff

    Umhoof

    Umhuff

    Ymoff

    . . . And still very incomplete. After generating this list I added several more spellings, and I know I have under-sampled the variations split in two, similar to Im Hoof. Iruhoff and Iuhoff strike me as training data for an evil twin of phonetic indexing: visual indexing. "ru" is a common misreading of "m" and of course "u" is a common misreading of "n".

    0
  • dontiknowyou
    dontiknowyou ✭✭✭✭✭
    October 11, 2021

    By the way, I am hoping FamilySearch uses or soon will use Family Tree as training data for its name indexing algorithms.

    So far, FT hints do not seem to know about look-alike transcription errors. I am having to search historical records by hand to find look-alike spelling variations such as Lerois for Lewis. I generate a list of possible variations mostly by crawling around on the FS research wiki, reading pages about reading handwriting.

    There are two ways FS could leverage contributor work to build such pattern matching algorithms:

    1. Compare names on FT to names on attached historical records.
    2. Collect our individual edits to FS historical record indexing errors. (We are of course also training the next generation of OCR indexing.)

    Providing lists of names in search results would support this very important infrastructure work.

    0
  • LDS Search Test
    LDS Search Test ✭✭✭
    October 11, 2021

    ! Keeping the topic live , and re-iterating my earlier " it does need to be a filter " observation.

    1: It is not possible to 'automate' (mac speak) or create a macro (win speak) or write a script (generic) to extract the data at the user's end. The reasons are primarily those of browser security. MacOS specifically prevents scripts from manipulating those elements of the web page that would allow the data that the OP requires from being collected using the 'Automation' function. I presume that Windows and Android will implement the same restrictions.

    2: The ability to search records is not available via the API. Although in theory it 'might' be possible to fabricate a mechanism to automatically collect and process data obtained from a machine-conducted search using the normal web interface, to attempt to conduct such an activity would be more likely to cause significant adverse reaction, if not damage.

    Conclusion: The desired result can only be accomplished by implementing a filter at the FS end, i.e. in the 'improved' search, or it's improved replacement....

    1
  • Julia Szent-Györgyi
    Julia Szent-Györgyi ✭✭✭✭✭
    October 11, 2021

    Going back a ways in the discussion, someone said:

    FamilySearch website does have a language setting (bottom of FamilySearch pages) next to COOKIES PREFERENCES. It does know what language the user would like to 'read'.

    The interface language has absolutely no bearing on the language of the records one is searching through. I leave the interface set to English, but the names are in Latin, Hungarian, German, Slovak, or sometimes a mixture (like Schuszter), never in English.

    It'd be nice if one could do true phonetic pattern-matching, but sometimes, the intended phonetics are impossible to determine. For example, the family now pronounces my great-grandmother's surname as [hɛjtlɛr], because that's what the usual spelling of "Heitler" comes out to in Hungarian, but the original German would've been more like [haɪtlɛr], and I don't know which one she or her parents used. My mother and her sisters do not have the kind of ear for language that would allow them to remember -- or even notice -- such a detail. (Remember that scene in The Little Mermaid where the crab whispers "are-ee-el" and the prince automatically hears "air-ee-el"?)

    And then there's misreading-based pattern matching: my aforementioned great-grandmother's baptism was originally indexed as Keiszer. This is a mix of "understandable/usual" (K versus H), "not impossible" (s versus t), and "huh?" (z versus lower-case L). I shudder to imagine the size of the database required to make any sense of the possibilities, and there are some patterns that are so entirely context-dependent that I'm not sure a unified approach can ever even work. For example, in English, that B-like thing is probably a B, but in German, it's much more likely to be ß. Similarly, in Latin letters, 'e' is highly unlikely to be mistaken for 'n', whereas in That Dratted German Script, the two letters are functionally identical.

    What it boils down to is that I'm not sure a unified database approach to finding patterns in names has any utility. I'm thinking that what we need is a highly flexible character-level search (such as regular expressions) combined with Wiki pages compiling useful search terms and strategies for each different context. Of course, determining useful context categories also gets very fuzzy very quickly.... Language Is Hard, no matter which way you turn it.

    2
  • genthusiast
    genthusiast ✭✭✭✭✭
    October 11, 2021 edited October 11, 2021

    To put some of this in layman's terms:

    1. It'd be nice to have a surname filter - provided by Familysearch - processing done on Familysearch end.

    2. This still could not provide full search results because language patterns may not be effectively captured by the originating Search.

    Conclusion: For those languages where the Search can capture and sort possible variants etc. - this would be a very helpful feature. I recommend an upvote.

    Comment: The Search> More Options> Preferences

    Language Options:

    Translated Text (should help understanding in desired translated language)

    Original Text (should help if the original language is spoken)

    0
  • dontiknowyou
    dontiknowyou ✭✭✭✭✭
    March 11 edited March 11

    I still want this feature. It is very important for assembling the tree because spelling differences, both variants and deviants, contribute so much to brick walls.

    Here is just a tiny portion of the variants list generated for just one surname study in which I participate.

    Screenshot 2023-03-11 at 9.09.59 AM.png


    0
Clear
No Groups Found

Categories

  • 30K All Categories
  • 24.1K FamilySearch Help
  • 122 Get Involved
  • 2.7K General Questions
  • 442 FamilySearch Center
  • 461 FamilySearch Account
  • 4.4K Family Tree
  • 3.4K Search
  • 4.7K Indexing
  • 635 Memories
  • 6.5K Temple
  • 321 Other Languages
  • 34 Community News
  • 6.6K Suggest an Idea
  • Groups