Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: how to get the Count of string in file  (Read 36677 times)

0 Members and 1 Guest are viewing this topic.

BC_Programmer


    Mastermind
  • Typing is no substitute for thinking.
  • Thanked: 1140
    • Yes
    • Yes
    • BC-Programming.com
  • Certifications: List
  • Computer: Specs
  • Experience: Beginner
  • OS: Windows 11
Re: how to get the Count of string in file
« Reply #75 on: August 14, 2010, 05:04:08 PM »
It works, I had to remove the accidental Type declaration still left, since the "original" is VB6:

Also, Added case insensitive option "/i".

Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
    CompareText=CBool(CompareText)
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
End Function


dim inputstrm
Dim lookin,lookfor

'see if /i was specified....
for each looparg in WScript.Arguments
    If UCase(looparg)="-I" or UCase(looparg)="/I" Then
       ignorecase=true
       Exit For
    End If
Next

Set inputstrm = CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(0))
lookfor = WScript.Arguments(1)
lookin=inputstrm.ReadAll()
WScript.Echo GetCountStr(lookin,lookfor,ignorecase)

I shall now endeavour to emulate the ridiculous manner in which Bill tests his code. I will refrain from the classic posting of the output from dir /? for no reason though.

test "input" file, "zwicky.txt":

Quote
in the 1930s and 1940s, many of Fritz Zwicky's colleagues regarded him as an irritating buffoon. Future generations of astronomers would look back on him as a creative genius.
    "By the time I knew Fritz in 1953, he was thoroughly convinced that he had the inside track to ultimate knowledge, and that everyone else was wrong," says William Fowler, then a student at Caltech (The Californian Institute of Technology) where Zwicky taught and did research. Jesse Greenstein, a Caltech colleague of Zwicky's from the late 1940's onward, recalls Zwicky as "a self-proclaimed genius... There's no doubt that he had a mind which was quite extraordinary, But he was also, although he didn't admit it, untutored and not self-controlled.
... HE taught a course in physics for which the admission was at his pleasure. If he thought that a person was sufficiently devoted to his ideas, that person could be admitted... He was very much alone [ among the Caltech physics faculty, and was] not popular with the establishment... His publications often included violent attacks on other people."

Zwicky-- a stocky, cocky man, always ready for a fight -- did not hesitate to proclaim his inside track to ultimate knowledge, or to tout the revelations it brought. In lecture after lecture during the 1930s, and article after published article, he trumpeted the concept of a neutron star-- a concept that he, Zwicky, had invented to explain the origins of the most energetic phenomena seen by astronomers: supernovae, and cosmic rays. He even went on the air in a nationally broadcast radio show to popularize his neutron stars. But under close scrutiny, his articles and lectures were unconvincing. They contained little substantiation for his ideas.

It was rumoured that Robert Millikan (the man who had built Caltech into a powerhouse among science institutions), when asked in the midst of all this hoopla why he kept Zwicky at Caltech, replied that it just might turn out that some of Zwicky's far-out ideas were right. Millikan, unlike some others in the science establishment, must have seen hints of Zwicky's intuitive genius - a genius that became widely recognized only thirty five years later, when observational astronomers discovered real neutron stars in the sky and verified some of Zwicky's extravagant claims about them.

Code: [Select]
D:\>Cscript /NOLOGO countstr.vbs zwicky.txt caltech /i
5

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt establishment
2

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt Establishment
0

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt Establishment /i
2

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt zwicky /i
10

D:\>
I was trying to dereference Null Pointers before it was cool.

victoria



    Beginner

    Thanked: 1
    How to get the Count of string in file
    « Reply #76 on: August 14, 2010, 05:42:50 PM »

    C:test>type cntstr.bat
    @echo  off

    sed s/%1/%1*n/g %2 |findstr %1| find  /c /v  **

    echo.
    rem type %2

    Output:

    C:test>cntstr.bat  Zwicky  zwicky.txt
    count=10


    C:\\test>cntstr.bat  Caltech  zwicky.txt
    count=5

    C:\\test>cntstr.bat  establishment  zwicky.txt
    count=2

    C:test>

    *  replace * with backslash
    ** replace ** with double quotes
    Have a Nice Day

    victoria



      Beginner

      Thanked: 1
      How to get the Count of string in file
      « Reply #77 on: August 14, 2010, 08:46:07 PM »
      ( The following batch code with tokens found the right count. But I had to massage the input file.  Someone with more token experience might correct the code? Thanks)
      *  replace * with a double quote.

      C:test>type try813.bat
      @echo off
      set /a  c=0
      setlocal enabledelayedexpansion
      for /f *tokens=1-26* %%a in (%2)  do (
      if *%%a*==*%1* set /a c=!c! + 1
      if *%%b*==*%1* set /a c=!c! + 1
      if *%%c*==*%1* set /a c=!c! + 1
      if *%%d*==*%1* set /a c=!c! + 1
      if *%%e*==*%1* set /a c=!c! + 1
      if *%%f*==*%1* set /a c=!c! + 1
      if *%%g*==*%1* set /a c=!c! + 1
      if *%%h*==*%1* set /a c=!c! + 1
      if *%%i*==*%1* set /a c=!c! + 1
      if *%%j*==*%1* set /a c=!c! + 1
      if *%%k*==*%1* set /a c=!c! + 1
      if *%%l*==*%1* set /a c=!c! + 1
      if *%%m*==*%1* set /a c=!c! + 1
      if *%%n*==*%1* set /a c=!c! + 1
      if *%%o*==*%1* set /a c=!c! + 1
      if *%%p*==*%1* set /a c=!c! + 1
      if *%%q*==*%1* set /a c=!c! + 1
      if *%%r*==*%1* set /a c=!c! + 1
      if *%%s*==*%1* set /a c=!c! + 1
      if *%%t*==*%1* set /a c=!c! + 1
      if *%%u*==*%1* set /a c=!c! + 1
      if *%%v*==*%1* set /a c=!c! + 1
      if *%%w*==*%1* set /a c=!c! + 1
      if *%%x*==*%1* set /a c=!c! + 1
      if *%%y*==*%1* set /a c=!c! + 1
      if *%%z*==*%1* set /a c=!c! + 1
      )
      echo count=%c%
      echo Display %2
      rem type %2
      Output:
      C:test> try813.bat  Zwicky  zwicky.txt
      count=10
      Have a Nice Day

      ghostdog74



        Specialist

        Thanked: 27
        Re: How to get the Count of string in file
        « Reply #78 on: August 15, 2010, 01:07:39 AM »
        Ghostdog,

        Some of the  members here at computerhope.com believe we should not use gawk and sed because the vbs script and batch  look better.
        this is the biggest joke of the year. awk/sed is excellent for parsing files and modifying it. Awk is also a little programming language capable of replacing cmd.exe. batch/vbscript look better? better in what sense? more lines of code means better? my gawk statement takes only 1 line, and it saves me enough time to go onto my other assignments. While you have to crack your head and come up with long and messy batch files like the last one you posted. By the time you finished, i am already off to bed and enjoying my sleep.

        Salmon Trout

        • Guest
        Re: how to get the Count of string in file
        « Reply #79 on: August 15, 2010, 01:22:38 AM »
        Quote
        better in what sense?

        More readable by others.

        ghostdog74



          Specialist

          Thanked: 27
          Re: how to get the Count of string in file
          « Reply #80 on: August 15, 2010, 01:33:06 AM »
          More readable by others.

          vbscript maybe, but definitely not batch.

          Salmon Trout

          • Guest
          Re: how to get the Count of string in file
          « Reply #81 on: August 15, 2010, 01:42:17 AM »
          vbscript maybe, but definitely not batch.

          I have to agree with you there. When I post one of those batch "solutions" where the batch file writes a vbscript on the fly, calls it, and then deletes the vbs, I get an uneasy feeling, like a surgeon advising somebody, when removing a gall stone with a carpenter's saw, to attach a scalpel blade to it with duct tape.

          ghostdog74



            Specialist

            Thanked: 27
            Re: how to get the Count of string in file
            « Reply #82 on: August 15, 2010, 02:29:50 AM »
            I have to agree with you there. When I post one of those batch "solutions" where the batch file writes a vbscript on the fly, calls it, and then deletes the vbs, I get an uneasy feeling, like a surgeon advising somebody, when removing a gall stone with a carpenter's saw, to attach a scalpel blade to it with duct tape.

            I always recommend not to do hybrids, ie combining batch+vbscript. Mostly due to my own experiences, i find it difficult to read and troubleshoot due to intermixing of different syntaxes, etc. vbscript can do what batch does so I myself would write in entire in vbscript. Anyway, this is OT already...so ...

            Salmon Trout

            • Guest
            Re: how to get the Count of string in file
            « Reply #83 on: August 15, 2010, 09:04:13 AM »
            Project Gutenberg has some  books in text file format. My code seems woefully slow compared to BC_Programmer's. Although either script counted "God" in the King James Bible in less than half a second. but see below...

            Salmon-count.vbs

            This is how I am going to try to do VBscripts in future...

            Code: [Select]
            Option Explicit

            'Setup
            Dim ObjFSO
            Dim ObjTS
            Dim StrFileName
            Dim StrLookString
            Dim StrThisline

            Dim SngStartSec
            Dim SngEndSec
            Dim SngElapsed
            Dim SngLineCount
            Dim SngTotalCount
            Dim SngSubsLen
            Dim SngSubStart
            Dim SngCaseSensitive

            'Input filename
            StrFileName=Wscript.Arguments(0)

            'String to search for
            StrLookString=Wscript.Arguments(1)

            'Case type - 1 = case sensitive 0 = case insensitive
            SngCaseSensitive = Wscript.Arguments (2)

            'Length of string to search for
            SngSubsLen = Len (StrLookString)

            'if case insensitive search
            'convert to lower case
            If SngCaseSensitive = 0 Then StrLookString = LCase(StrLookString)

            'Initialise File System Object
            Set ObjFSO=Createobject("Scripting.Filesystemobject")

            'Open input file
            Set ObjTS=ObjFSO.Opentextfile(StrFileName)

            'Store start time (secs since midnight)
            SngStartSec = Timer

            'Keep reading lines until all done
            Do While Not ObjTS.Atendofstream
            'Get line
            StrThisLine=ObjTS.Readline
            'if case insensitive search
            'convert to lower case
            If SngCaseSensitive = 0 Then StrThisLine=LCase(StrThisLine)
            'Set count to zero
            SngLineCount = 0
            Do
            'Is string in line? If so, get place
            SngSubStart = InStr ( StrThisLine, StrLookString )
            'If found, add 1 to counter
            If SngSubStart > 0 then SngLineCount = SngLineCount + 1
            'If found, chop off string before
            StrThisLine = Mid ( StrThisLine, ( SngSubstart + SngSubsLen ) )
            'Exit when no more found
            Loop Until SngSubstart = 0
            'Add count from this line to total
            SngTotalCount = SngTotalCount + SngLineCount
            Loop

            'Close input file
            ObjTS.Close

            'Store end time (secs since midnight)
            SngEndSec = Timer

            'Subtract to get elapsed
            SngElapsed = SngEndsec - SngStartSec

            'Show results
            wscript.echo SngTotalCount
            wscript.echo formatnumber(SngElapsed,3)


            BCP_count.vbs

            Code: [Select]
            Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
                CompareText=CBool(CompareText)
                GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
            End Function


            dim inputstrm
            Dim lookin,lookfor

            Dim StartSec, Endsec, Elapsed

            'see if /i was specified....
            for each looparg in WScript.Arguments
                If UCase(looparg)="-I" or UCase(looparg)="/I" Then
                   ignorecase=true
                   Exit For
                End If
            Next

            Startsec=Timer
            Set inputstrm = CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(0))
            lookfor = WScript.Arguments(1)
            lookin=inputstrm.ReadAll()

            Endsec=Timer
            Elapsed=Endsec - Startsec
            WScript.Echo GetCountStr(lookin,lookfor,ignorecase)
            wscript.echo Formatnumber(Elapsed, 3)


            Code: [Select]
            Salmon-count.vbs "H G Wells The War Of The Worlds.txt" "Martians" 1
            156
            0.043

            BCP-count.vbs "H G Wells The War Of The Worlds.txt" "Martians"
            156
            0.016

            Salmon-count.vbs "Complete Works Of Shakespeare.txt" "Hamlet" 1
            113
            0.688

            BCP-count.vbs "Complete Works Of Shakespeare.txt" "Hamlet"
            113
            0.250

            Salmon-count.vbs "Tolstoy War And Peace.txt" "Pierre" 1
            1963
            0.383

            BCP-count.vbs "Tolstoy War And Peace.txt" "Pierre"
            1963
            0.145

            Salmon-count.vbs "King James Bible.txt" "God" 1
            4167
            0.359

            BCP-count.vbs "King James Bible.txt" "God"
            4167
            0.188

            Salmon-count.vbs "Samuel Richardson Clarissa.txt" "she" 1
            8861
            1.156

            bcp-count.vbs "Samuel Richardson Clarissa.txt" "she"
            8861
            0.234

            but...

            I downloaded a text file containing 1 million places of pi (1,000,000,002 bytes) with no carriage returns. I figured that my code wouldn't like that, so I used GNU fold to insert cr/lf pairs every 80 columns. However, when I tried BCP's code on it, oh dear! The system got awfully sluggish and I watched my available RAM go down from 3.2 GB to 24 MB before I used Process Explorer to terminate cscript.exe. But my "slow" code just chewed its way through in 1 minute 44 and a bit seconds...

            Code: [Select]
            salmon-count.vbs "1 billion places of pi.txt" "567" 0
            975498
            104.430

            Code: [Select]
                  351,218 H G Wells The War Of The Worlds.txt
                3,288,738 Tolstoy War And Peace.txt
                4,397,206 King James Bible.txt
                5,582,655 Complete Works Of Shakespeare.txt
                5,616,676 Samuel Richardson Clarissa.txt
            1,025,000,002 1 billion places of pi.txt

            System:

            Shuttle SN78SH7, AMD Phenom II 945 (quad core), 4 GB Crucial 800 MHz DDR2 RAM,  Windows 7 64 bit, files read from Seagate 320GB external USB 2.0 drive.







            ghostdog74



              Specialist

              Thanked: 27
              Re: how to get the Count of string in file
              « Reply #84 on: August 15, 2010, 09:54:15 AM »
              Most probably due to the readall(). BCP's code reads the whole file into memory. If your 1 billion pi (or is it 1 million? ) text file size is very big, then that explains the sluggishly of fitting all into memory.

              victoria



                Beginner

                Thanked: 1
                How to get the Count of string in file
                « Reply #85 on: August 15, 2010, 10:16:42 AM »


                arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

                I need to get a count of INS* in the above file. Am new to DOS Commands.


                Arunavlp,

                The posts at the end of your thread and in the middle are so far off topic that
                the posters must start a new thread for their strange ideas.


                Im pleased that your problem of counting how often a string appears in file has been answered several times.

                The Sed solution is the best solution.
                Have a Nice Day

                Sidewinder



                  Guru

                  Thanked: 139
                • Experience: Familiar
                • OS: Windows 10
                Re: how to get the Count of string in file
                « Reply #86 on: August 15, 2010, 11:06:33 AM »
                Quote
                The Sed solution is the best solution.

                Well, that's certainly the definitive answer.

                Actually I still like my response back in post 6. By using the replace function to insert nulls in place of the all the occurrences of the search argument, the original string is effectively shortened (nulls have zero length). Using some 3rd grade arithmetic, you can calculate the difference in lengths between the original string and the replacement string. This gives the number of nulls that were added to the file. Dividing by the length of the search argument. the result is the number of occurrences of the substring in the original string.

                Powershell can do this as a one liner more readable than SED. There is always more than one solution to any coding problem. Makes me wonder why many posters request a specific type solution.

                 8)
                The true sign of intelligence is not knowledge but imagination.

                -- Albert Einstein

                Salmon Trout

                • Guest
                Re: how to get the Count of string in file
                « Reply #87 on: August 15, 2010, 11:55:39 AM »
                If your 1 billion pi (or is it 1 million? ) text file size is very big

                One thousand and twenty-five thousand million and two bytes (1,025,000,002) as I posted above.

                Quote
                then that explains the sluggishly of fitting all into memory.

                Did I imply that I did not already realise this?



                Salmon Trout

                • Guest
                Re: how to get the Count of string in file
                « Reply #88 on: August 15, 2010, 11:58:33 AM »
                There is always more than one solution to any coding problem. Makes me wonder why many posters request a specific type solution.

                Unlike hobbyists at home using their own systems, many people asking for help have already partially completed a script and /or are using employer's computers on which restrictions are in place preventing installation of 3rd party software.

                BC_Programmer


                  Mastermind
                • Typing is no substitute for thinking.
                • Thanked: 1140
                  • Yes
                  • Yes
                  • BC-Programming.com
                • Certifications: List
                • Computer: Specs
                • Experience: Beginner
                • OS: Windows 11
                Re: how to get the Count of string in file
                « Reply #89 on: August 15, 2010, 01:04:46 PM »
                I just sort of threw mine together, wasn't too interested in making sure it worked for gigantic files :P

                Here's a version that reads in chunks instead:

                Code: [Select]
                Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
                    CompareText=CBool(CompareText)
                    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
                End Function


                dim inputstrm
                Dim lookin,lookfor

                'see if /i was specified....
                for each looparg in WScript.Arguments
                    If UCase(looparg)="-I" or UCase(looparg)="/I" Then
                       ignorecase=true
                       Exit For
                    End If
                Next
                set FSO=CreateObject("Scripting.FileSystemObject")
                set FileOpen = FSO.GetFile(WScript.Arguments(0))

                'read in chunks of 32K:
                chunksize = 32*1024
                numchunks = FileOpen.Size \ (chunksize)
                remainder = fileopen.Size mod (chunksize)


                Set inputstrm = FSO.OpenTextFile(WScript.Arguments(0))
                lookfor = WScript.Arguments(1)
                Strhangoff=""
                Do Until(inputstrm.AtEndOfStream)
                 readchunk = strhangoff + inputstrm.Read(chunksize)
                 RunnerCount=RunnerCount + GetCountStr(readchunk,lookfor,ignorecase)
                 
                 
                 
                 strhangoff = right(readchunk,len(lookfor)-2)   '-1 on the length so we don't grab the entire thing
                 'if it happens to be exactly on the end of the string, so nothing is counted twice.
                 


                Loop


                WScript.Echo RunnerCount

                I don't actually have a super extra large file to test it on, so I made one by duplicating the "zwicky.txt" file over itself several hundred times.

                This one is certainly faster then the ReadAll() method idea. I've added a small provision so that it doesn't "miss" entries by reading half of the string at the end of a chunk and the rest on the next chunk (thereby not finding it) by copying a "hangoff" at the end of the previous chunk to the start of the next chunk. I make sure the chunk is shorter then the search string itself by one character, this prevents finding of the string twice in the edge case where it is found at the <very> end of a chunk (which otherwise would be counted twice- once in the first chunk and once in the next chunk in which it would be copied to).

                The main difficulty was getting used to the <Microsoft> FileSystemObjects- I'm used to using my own library. Not sure if there would be much of a speed difference, there, but it's what I'm used to (not counting the .NET IO namespace). the interesting thing is that the only differences are method names (I chose "Eof" rather then "AtEndOfStream" to indicate the stream was at the end of the file) and of course ProgIDs, everything else could be pretty much exactly the same.


                Quote
                Actually I still like my response back in post 6. By using the replace  function to insert nulls in place of the all the occurrences of the search argument, the original string is effectively shortened (nulls have zero length). Using some 3rd grade arithmetic, you can calculate the difference in lengths between the original string and the replacement string. This gives the number of nulls that were added to the file. Dividing by the length of the search argument. the result is the number of occurrences of the substring in the original string.

                Actually, your idea is pretty much a slight variant of what my routine does:

                Code: [Select]
                Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
                    CompareText=CBool(CompareText)
                    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
                End Function

                It replaces the text being searched for with an empty string, and then does the math. It's actually easier to do it this way rather then replacing it with null, since the size difference between the original and the "replaced" version will be off by an exact multiple of the length of the string to search for. I wrote this a good few years ago, and had to "translate" from the VB6 it was written in to VBS.

                Powershell can do this as a one liner more readable than SED. There is always more than one solution to any coding problem. Makes me wonder why many posters request a specific type solution.

                It could be a one-liner in VBScript, but it would be both hard to read and somewhat silly. And it would require ReadAll() again.

                Using .NET 4.0/ C# 4.0 it might even be possible to read in a number of chunks at once and then "count" the occurrences of each one in parallel using the Parallel For construct. The same would be possible in 3.5 but would require the manual spinning of said threads and less then enviable use of locks to prevent resource contention. It's pretty interesting that such a simple problem can have such varied solutions, but not at all surprising.


                I was trying to dereference Null Pointers before it was cool.