Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Regular Expression Help (Searching name patterns)  (Read 13179 times)

0 Members and 1 Guest are viewing this topic.

Circuit_Girl

    Topic Starter


    Rookie

    Regular Expression Help (Searching name patterns)
    « on: August 29, 2010, 01:46:23 PM »
    I am learning UNIX and the use of Regex.
    I need to write a regular expression that recognizes the following name patterns:

    Mr. John Public
    Ms. Ida Psuedonym
    Mister Baggins, Bilbo
    Ms Ham, Virginia

    * Honorifics may be missing (mr., ms., Mister)
    * Dot may be missing
    * Names may be (last name, first name or first name, last name)

    Any help would be greatly appreciated, thank you.

    ghostdog74



      Specialist

      Thanked: 27
      Re: Regular Expression Help (Searching name patterns)
      « Reply #1 on: August 29, 2010, 07:17:01 PM »
      you have been learning since 2009, when you have already touched things like Perl, awk, *nix. so what have you tried and got so far? Also, you are trying to match names, but is the data you provided only that ? or there are other information?


      Sidewinder



        Guru

        Thanked: 139
      • Experience: Familiar
      • OS: Windows 10
      Re: Regular Expression Help (Searching name patterns)
      « Reply #2 on: August 30, 2010, 03:43:39 PM »
      With regular expressions the devil is in the details. I had help using the freebie RegexBuilder which works well with PerlScript and VBScript. Probably works well with others as well.

      Code: [Select]
      \b[A-Z]+[a-z\.]{1,}\s[A-Z]+[A-Za-z,\s\.\-']{1,}[A-Z]+[A-Za-z,\s\.\-']{1,}\b

      This should also pick up names that are hyphenated, include an apostrophe, or have a letter initial for a first name. All RegEx engines are not exactly the same so results may vary.

      Good luck.  8)
      « Last Edit: August 30, 2010, 04:35:35 PM by Sidewinder »
      The true sign of intelligence is not knowledge but imagination.

      -- Albert Einstein

      Fields



        Beginner

        Thanked: 3
        Re: Regular Expression Help (Searching name patterns)
        « Reply #3 on: August 30, 2010, 06:40:27 PM »
        I am learning UNIX and the use of Regex.
        I need to write a regular expression that recognizes the following name patterns:


        C:test>Display   reg.txt

        Mr. John Public
        Ms. Ida Psuedonym
        Mister Baggins, Bilbo
        Ms Ham, Virginia

        C:test>grep Ida  reg.txt
        Ms. Ida Psuedonym

        C:test>findstr Ida  reg.txt
        Ms. Ida Psuedonym

        C:test>grep Bilbo  reg.txt
        Mister Baggins, Bilbo

        C:test>findstr Bilbo  reg.txt
        Mister Baggins, Bilbo

        C:test>
        _______________________________

        grep is a command line text search utility originally written for Unix. The name is taken from the first letters in global / regular expression / print, a series of instructions in text editors.  The grep command searches files or standard input globally for lines matching a given regular expression, and prints them to the programs standard output.

        http://en.wikipedia.org/wiki/Grep
        Member of the Human Race; Citizen of the World.

        ghostdog74



          Specialist

          Thanked: 27
          Re: Regular Expression Help (Searching name patterns)
          « Reply #4 on: August 30, 2010, 07:53:22 PM »
          @fields, do you even know what's happening?

          ghostdog74



            Specialist

            Thanked: 27
            Re: Regular Expression Help (Searching name patterns)
            « Reply #5 on: August 30, 2010, 07:55:11 PM »

            Code: [Select]
            \b[A-Z]+[a-z\.]{1,}\s[A-Z]+[A-Za-z,\s\.\-']{1,}[A-Z]+[A-Za-z,\s\.\-']{1,}\b


            but does it pick up words that are capitalized, but not names or honorifics? I don't think OP provided comprehensive enough data for us to help further.

            ghostdog74



              Specialist

              Thanked: 27
              Re: Regular Expression Help (Searching name patterns)
              « Reply #6 on: August 30, 2010, 09:15:52 PM »
              I am learning UNIX and the use of Regex.
              I need to write a regular expression that recognizes the following name patterns:

              Mr. John Public
              Ms. Ida Psuedonym
              Mister Baggins, Bilbo
              Ms Ham, Virginia

              * Honorifics may be missing (mr., ms., Mister)
              * Dot may be missing
              * Names may be (last name, first name or first name, last name)

              Any help would be greatly appreciated, thank you.


              here's an example  using grep (GNU)
              Code: [Select]
              $ more file
              Blah Blah Text Text Mr. John Public what a troll....
               blab blah Ms. Ida Psuedonym ABC def
              die die die Mister Baggins, Bilbo end end end
              go eat **** mrs Ham, Virginia you b****
              honorfics missing Peter Jackson bloody Mary.
              Here's the lady Miss Jennifer-Beals welcome her please.
              Mr Bill Richard is a troll

              $  grep -Pio "([Mm][rs][s]*\.*|Mister|Miss)\s+\w+[- \t,]*\w+[ \t]*" file
              Mr. John Public
              Ms. Ida Psuedonym
              Mister Baggins, Bilbo
              mrs Ham, Virginia
              Miss Jennifer-Beals
              Mr Bill Richard


              it doesn't take care of missing honorifics, since you wont' have a specific way to know when its a name of when its just a capitalized word (depends very much on your data).

              Sidewinder



                Guru

                Thanked: 139
              • Experience: Familiar
              • OS: Windows 10
              Re: Regular Expression Help (Searching name patterns)
              « Reply #7 on: August 31, 2010, 01:04:14 PM »
              I promised myself I would not post behind our resident troll, but feel it would be unfair to the OP since I posted before he got involved with this thread. The OP wins out, if she ever returns.

              I admit I swiped GD's pattern for the honorifics, but I get different results when using GREP and VBScript. I chose not to use the \w notation to specifically exclude numbers.

              Pattern:
              Code: [Select]
              \b([M][rs][s]?\.?|Mister|Miss)\s[A-Z']{1,3}[a-z',]{1,}(\s|\-)([A-Z']{1,3}[a-z]{1,})\b

              The GREP resulted with this list:
              Quote
              Mr. John Public
              Ms. Ida Psuedonym
              Mister Baggins, Bilbo
              mrs Ham, Virginia
              Miss Jennifer-Beals
              Mr Bill Richard

              The VBScript resulted with this list:
              Quote
              Mr. John Public
              Ms. Ida Psuedonym
              Mister Baggins, Bilbo
              Miss Jennifer-Beals
              Mr Bill Richard

              Note that mrs Ham, Virginia is missing from the VBScript results. This is correct as mrs is not capitalized. Be aware of different engines producing different results.

              As GD mentioned, only the honorifics specifically mentioned were incorporated into the match pattern.

               8)
              « Last Edit: August 31, 2010, 01:47:51 PM by Sidewinder »
              The true sign of intelligence is not knowledge but imagination.

              -- Albert Einstein

              BC_Programmer


                Mastermind
              • Typing is no substitute for thinking.
              • Thanked: 1140
                • Yes
                • Yes
                • BC-Programming.com
              • Certifications: List
              • Computer: Specs
              • Experience: Beginner
              • OS: Windows 11
              Re: Regular Expression Help (Searching name patterns)
              « Reply #8 on: August 31, 2010, 01:11:18 PM »
              I promised myself I would not post behind our resident troll, but feel it would be unfair to the OP since I posted before he got involved with this thread. The OP wins out, if she ever returns.

              One problem though, is that IMO opinion the OP <IS> our resident troll. He does it a lot, and it wastes everybody's time. Problem is it's difficult to tell.
              I was trying to dereference Null Pointers before it was cool.

              Fields



                Beginner

                Thanked: 3
                Re: Regular Expression Help (Searching name patterns)
                « Reply #9 on: August 31, 2010, 01:25:21 PM »
                One problem though, is that IMO opinion the OP <IS> Fields. He does it a lot, and it wastes everybodys time. Problem is its difficult to tell.

                Hey Boy (BCP) you are confused. Fields and the OP are not the same person. 

                Can BCP say Paranoid to the Max.  Is BCP a Shutin with unlimited imagination?

                The mental breakdown of BCP?
                Member of the Human Race; Citizen of the World.

                BC_Programmer


                  Mastermind
                • Typing is no substitute for thinking.
                • Thanked: 1140
                  • Yes
                  • Yes
                  • BC-Programming.com
                • Certifications: List
                • Computer: Specs
                • Experience: Beginner
                • OS: Windows 11
                Re: Regular Expression Help (Searching name patterns)
                « Reply #10 on: August 31, 2010, 01:39:59 PM »
                Wow he's really desperate for Troll food!  ::)

                I was trying to dereference Null Pointers before it was cool.

                Fields



                  Beginner

                  Thanked: 3
                  Regular Expression Help (Searching name patterns)
                  « Reply #11 on: August 31, 2010, 01:41:39 PM »
                  Fields, do you even know whats happening?

                  I know we should post sample output when we post code.  The Ghost is unable to learn that.

                  But The Ghost is right:  I do not see any need to corrupt the useful grep command and use grep in such a strange manner.  

                  _______________________________

                  Twenty years old and still in  the eight grade?
                  Member of the Human Race; Citizen of the World.

                  BC_Programmer


                    Mastermind
                  • Typing is no substitute for thinking.
                  • Thanked: 1140
                    • Yes
                    • Yes
                    • BC-Programming.com
                  • Certifications: List
                  • Computer: Specs
                  • Experience: Beginner
                  • OS: Windows 11
                  Re: Regular Expression Help (Searching name patterns)
                  « Reply #12 on: August 31, 2010, 01:46:38 PM »
                  haha... as amusing as he was before, it's even funnier when nobody feeds him what he wants!
                  I was trying to dereference Null Pointers before it was cool.

                  ghostdog74



                    Specialist

                    Thanked: 27
                    Re: Regular Expression Help (Searching name patterns)
                    « Reply #13 on: August 31, 2010, 06:05:24 PM »
                    Note that mrs Ham, Virginia is missing from the VBScript results. This is correct as mrs is not capitalized. Be aware of different engines producing different results.
                    actually its not really the engine, but its because i put case insensitivity [Mm] as well as -i option for grep. That's why "mrs" is captured as well. In fact [Mm] could be just M or m since -i is used.

                    Sidewinder



                      Guru

                      Thanked: 139
                    • Experience: Familiar
                    • OS: Windows 10
                    Re: Regular Expression Help (Searching name patterns)
                    « Reply #14 on: September 01, 2010, 05:14:23 AM »
                    actually its not really the engine, but its because i put case insensitivity [Mm] as well as -i option for grep. That's why "mrs" is captured as well. In fact [Mm] could be just M or m since -i is used.

                    I'm not  understanding the reasoning of building a regex pattern that specifically masks upper case letters (Mister, Miss, M) and then directing the regex engine to run in case insensitive mode. VBScript has such a feature also (different syntax) but in all my experience, I have never seen anyone use it....not once. Oh well, live and learn.

                    It's all academic anyway. Nifty concept, if Fields and Circuit_Girl  turn out to be the same person,  he can ask and answer his own questions. Reminds me of the Mother and Norman relationship.

                     8)
                    The true sign of intelligence is not knowledge but imagination.

                    -- Albert Einstein