Computer Hope

Microsoft => Microsoft DOS => Topic started by: arunavlp on August 03, 2010, 04:43:17 AM

Title: how to get the Count of string in file
Post by: arunavlp on August 03, 2010, 04:43:17 AM
hi,

Am having a file with 1 line having a file size of 35MB.


Eg:-
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

I need to get a count of INS* in the above file. Am new to DOS Commands.

Please help me.

Thanks in Advance.

Regards,
Arun S.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 03, 2010, 08:19:16 AM
hi,

Am having a file with 1 line having a file size of 35MB.


Eg:-
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

I need to get a count of INS* in the above file. Am new to DOS Commands.

Please help me.

Thanks in Advance.

Regards,
Arun S.

download  gawk f (http://gnuwin32.sourceforge.net/packages/gawk.htm)or windows,
then
Code: [Select]
c:\test> gawk "{m=gsub("INS",""); total+=m}END{print "total:" total}" file
Title: Re: how to get the Count of string in file
Post by: arunavlp on August 04, 2010, 11:21:59 PM
hi ,

Thanks for suggestion. but i got an error message like this

30.834
gawk: {m=gsub(INS,");
gawk:             ^ unterminated string

i dont know wht this error means. Please help me on this.


Regards,
Arun S.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 04, 2010, 11:57:50 PM
Escape your double quotes

Code: [Select]
c:\test> gawk "{m=gsub(\"INS\",\"\"); total+=m}END{print \"total:\" total}" file
Title: Re: how to get the Count of string in file
Post by: arunavlp on August 05, 2010, 12:34:57 AM
hi,

Thanks It works..  :) but please let me know if we can do it in Find Command....

Regards,
Arun S.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 05, 2010, 01:14:50 AM
hi,

Thanks It works..  :) but please let me know if we can do it in Find Command....

Regards,
Arun S.
i personally wouldn't bother. find (or findstr) just find the string on a line for you. It won't count how many there are. More involved programming is needed. ( that i will leave it someone else who has the expertise and time to show you, )
When parsing files and doing string manipulation, use a good tool for the job.
Title: Re: how to get the Count of string in file
Post by: Sidewinder on August 05, 2010, 04:31:42 AM
The find command will count the lines with the search argument. If a line has more than one occurrence of the search argument, it still counts for one. Findstr does not do counting but allows for multiple search arguments and a limited form of regular expressions.

You can use VBScript which came with your Windows machine. The little demo script will prompt the user for the file name and the search argument. It can be tweaked to remove the prompts (which will probably gut the majority of the script). ;D

Code: [Select]
Const ForReading = 1

Set fso = CreateObject("Scripting.FileSystemObject")

Do
  WScript.StdOut.Write "Please enter file name: "
  strFile = WScript.StdIn.ReadLine
  If fso.FileExists(strFile) Then
  Set objFile = fso.OpenTextFile(strFile, ForReading)
strCharacters = objFile.ReadAll
  Exit Do
  Else
    WScript.StdOut.Write "Invalid file name ... Try Again" & vbCrLf
  End If
Loop

Do
  WScript.StdOut.Write "Please enter character string: "
  strToCount = WScript.StdIn.ReadLine
  If strToCount <> "" Then Exit Do
Loop

strTemp = Replace(LCase(strCharacters), LCase(strToCount), "")
WScript.Echo "Occurences of:", strToCount, "=", (Len(strCharacters) - Len(strTemp)) / Len(strToCount)

objFile.Close

Save the script with a vbs extension and run only from the command prompt as: cscript scriptname.vbs

Good luck.  ;D
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 06, 2010, 06:38:36 PM
I can give you Idea what it should like to be:

set /p pass= <string.txt
echo %pass%
call set new=%%pass:~%a%,1%%
set /a a=%a% + 1
  set key=%key%%new%
echo %new%

This new will give you the number of string.
However, I am going will give you further details tommorrow

Thanks and regard
vishu
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 07, 2010, 02:33:12 AM
Code: [Select]
set /p pass=<string.txt
echo %pass%
:st
call set new=%%pass:~%a%,1%%
echo a=%a% + 1
echo %a%
set key=%key%%new%
echo %new%
echo %key%
pause
::if %new% ==; goto :EOF

pause
goto :st


All we need to fix is loop.
Change the string.txt to your file drive:path\file name
Gave you a best option

Title: Re: how to get the Count of string in file
Post by: victoria on August 07, 2010, 01:04:33 PM
@echo  off

sed s/the/the\\n/g yz.txt | egrep -c the



counthe.bat
10
type yz.txt
the
the
the
the
the the the
the the the
Title: Re: how to get the Count of string in file
Post by: victoria on August 07, 2010, 02:21:14 PM
Two \\ should be one

C:\\test>type   cntstr.bat
rem @echo  off
sed s/%1/%1\\n/g %2 | egrep -c %1

C:\\test>cntstr.bat  the yz.txt

C:\\test>rem @echo  off

C:\\test>sed s/the/the\\n/g yz.txt   | egrep -c the
10

C:\\test>type yz.txt
the
the
the
the
the the the
the the the
Title: Re: how to get the Count of string in file
Post by: victoria on August 07, 2010, 05:42:49 PM
Only one \\ backslash each time

type cntstr.bat
rem @echo  off
sed s/%1/%1\\n/g %2 | egrep -c %1

cntstr.bat   22  yr2010.doc

rem @echo  off

sed s/22/22\\n/g yr2010.doc   | egrep -c 22
12

(http://i7.photobucket.com/albums/y268/billrich/2010.jpg)
Title: Re: how to get the Count of string in file
Post by: victoria on August 08, 2010, 03:47:04 PM
(http://i7.photobucket.com/albums/y268/billrich/generic-1.jpg)
Title: How to get the Count of string in file
Post by: victoria on August 08, 2010, 04:05:30 PM
Output for reply #6 by sidewinder


cscript   swcnt.vbs
Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

Please enter file name: yr2010.doc
Please enter character string: 22
Occurences of: 22 = 12

Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 08, 2010, 04:17:14 PM
Victoria, I really understand wht these commands will do.

Seems like not a proper bat file
Title: Re: how to get the Count of string in file
Post by: victoria on August 08, 2010, 04:51:04 PM
Victoria, I really do not understand what these commands will do.

Seems like not a proper bat file.

It is a VBS written by Sidewinder in Reply #6.  It works perfectly.

I do not write VBS.

Many ways to skin a cat.

Title: How to get the Count of string in file
Post by: victoria on August 09, 2010, 05:18:13 AM
I use a Proxy Server to reach Computer Hope.  The Editor used to post this post  does not work well.   I get random extra \\.  There  was only one \\ each time here.

Nevertheless,  the string count by me and sidewinder is the same.

Please do not question my sanity.

Title: Re: How to get the Count of string in file
Post by: BC_Programmer on August 09, 2010, 05:43:28 AM
I use a Proxy Server to reach Computer Hope. 

We know.
Title: Re: how to get the Count of string in file
Post by: arunavlp on August 09, 2010, 05:52:52 AM
hi,

It got Worked Thanks. Smart Work.


Regards,
Arun S.
Title: Re: How to get the Count of string in file
Post by: victoria on August 09, 2010, 05:55:41 AM
We know.

How do I avoid random \\ when I post?
Title: Re: how to get the Count of string in file
Post by: kpac on August 09, 2010, 10:35:04 AM
victoria...just wondering, what's the proxy server's IP?
Title: Re: how to get the Count of string in file
Post by: victoria on August 09, 2010, 11:54:11 AM
victoria...just wondering, what is  the proxy server IP?

????

The Ip address is dynamic and several Proxy Servers rotate the IP address?

Not really sure?
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 09, 2010, 02:04:30 PM
We can do this using DOS also

Please run this in loop. Once please enter the condition once we get EOF. like '\0' or null or " " etc. I do not know exactly what to use for batch.

Please complete or correct my codes.
Code: [Select]
set /p pass=<string.txt
echo %pass%
:st
call set new=%%pass:~%a%,1%%
echo a=%a% + 1
echo %a%

Thanks and regards
vishu
Title: Re: how to get the Count of string in file
Post by: victoria on August 09, 2010, 08:19:40 PM

Please complete or correct my codes.
Code: [Select]
set /p pass=<string.txt
echo %pass%
:st
call set new=%%pass:~%a%,1%%
echo a=%a% + 1
echo %a%


(http://i7.photobucket.com/albums/y268/billrich/vincnt89.jpg)
Title: Re: how to get the Count of string in file
Post by: victoria on August 09, 2010, 08:21:49 PM
To Vis,


(http://i7.photobucket.com/albums/y268/billrich/vincnt89.jpg)
Title: How to get the Count of string in file
Post by: victoria on August 10, 2010, 05:25:58 PM
Hello.  Wrong Thread.
Title: Re: how to get the Count of string in file
Post by: victoria on August 11, 2010, 08:04:58 PM
We can do this using DOS also

Please run this in loop. Once please enter the condition once we get EOF. like 01\'0\' or null or \" \" etc. I do not know exactly what to use for batch.

Please complete or correct my codes.
Code: [Select]
set /p pass=<string.txt
echo %pass%
:st
call set new=%%pass:~%a%,1%%
echo a=%a% + 1
echo %a%

Thanks and regards
vishu

C:test>type viscnt.bat
@echo off
set /a a=2
echo Here is a string > string.txt
set /p pass=<string.txt
echo pass=%pass%
REM :st is label or a point in the code
REM where we jump to or return to

call :st %a%
echo  return from :st

rem we may use call to jump to or return to a location
rem ( a label ) in the code or to rem another batch file

rem set new=%%pass:~%a%,1%%  I do not  know what this does
set new=%pass:~%a%,1%
echo new=%new%
rem set assigns a value to a variable.
rem A variable is a location in RAM where the value is stored
set /a a=%a% + 1
echo a=%a%
goto :end
:st %a%
echo a=%1
echo We are at  the  :st label location
echo a=%1
exit /b
:end

Output:

C:test>viscnt.bat
pass=Here is a string
a=2
We are at  the  :st label location
a=2
 return from :st
new=Here is a string a
a=3

Title: Re: How to get the Count of string in file
Post by: Salmon Trout on August 12, 2010, 12:39:00 AM
Hello.  Wrong Thread.

Getting confused, Bill?  :)
Title: How to get the Count of string in file
Post by: victoria on August 12, 2010, 02:23:56 AM
Say What?
Title: Re: How to get the Count of string in file
Post by: Salmon Trout on August 12, 2010, 09:07:15 AM
Say What?

We know it's you, Bill
Title: Re: How to get the Count of string in file
Post by: victoria on August 12, 2010, 11:38:02 AM
We know it\'s you, Bill?

What is Salmon Trout talking about?

Is Salmon Trout part of the ComputerHope.com Staff?
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 12, 2010, 11:42:07 AM
That third person thing is a "dead giveaway" as Londoners say, Bill.
Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 11:50:49 AM
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

I need to get a count of INS* in the above file. Am new to DOS Commands.


Arunavlp,

Im sorry the thread got off topic.

Swindwinder* and Ghostdog  provided excellent methods for counting the number of times a string appears in a document.

Please ignore the off topic posts.

Good Luck

*   Reply #6 on: August 05, 2010, 04:31:42 AM
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 12, 2010, 12:31:31 PM
Victoria, if you aren't Billrich, how come you have access to his Photobucket account?
Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 02:37:11 PM

Thanks It works..  but please let me know if we can do it in Find Command....

Arun S.

Arun,

Im sorry Arun the off topic posts continue.

Some posters never make any suggestions for counting strings in a document.

These posters write about topics completely unrelated to counting strings.

_____________________________

sed s/22/22*n/g yr2010.doc   | egrep -c 22
Number of 22 strings is 12 in calendar.  One 22 string for each month

* use a blackslash above to add a newline for strings without a newline


Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 12, 2010, 03:12:50 PM
Can we use FINDSTR
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 12, 2010, 03:18:44 PM
I have an idea that we search for the end of file
till then run this command in loop.
Run a variable counter. Till the loop runs.
Once we reach to End of file stop the loop.
Check the variable. That would of number of strings.


and we can also use

for /f "delims=" %%i in (id.txt)  do (
echo i = %%i
)

if we can apply all these.
Can anyone create this.

That would be done.

Thanks and regards
vishu
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 12, 2010, 03:25:59 PM

These posters write about topics completely unrelated to counting strings.


Like your breach of the forum rules. Squirm how you like, you've been rumbled!

Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 03:26:37 PM
Can we use FINDSTR

Let us see your code.

Findstr with a counter might work?  Findstr will usually only print the line where the string is found.  When the string appears twice in the same line, only one string is counted.  Therefore the final count is wrong.  

I will try findstr again.  Sidewinder and the other experts stated findstr will not work.

Let us see your code.
Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 04:45:15 PM

for /f *delims=* %%i in (id.txt)  do (
echo i = %%i
)

vishu


C:test>type vis812.bat
REM  Replace * with a double quote symbol
@echo off
set /a  c=0
setlocal enabledelayedexpansion
for /f *tokens=1-5* %%i in (id.txt)  do (
echo %%i %%j %%k
if *%%i*==*the* set /a c=!c! + 1
if *%%j*==*the* set /a c=!c! + 1
if *%%k*==*the* set /a c=!c! + 1
)
echo count=%c%
echo.
echo Display id.txt
echo.
type id.txt

Output.

C:test>vis812.bat
the
the
the
the
the the the
the the the
count=10

Display id.txt

the
the
the
the
the the the
the the the

p.s. The above code uses only batch code but not findstr.
The code will only count the string in the id.txt. There are 10 theS in the id.txt.
The code will most likely not work with other text files
Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 05:50:23 PM
Can we use FINDSTR


C:test>findstr  22  yr2010.doc
17 18 19 20 21 22 23   21 22 23 24 25 26 27   21 22 23 24 25 26 27
18 19 20 21 22 23 24   16 17 18 19 20 21 22   20 21 22 23 24 25 26
18 19 20 21 22 23 24   22 23 24 25 26 27 28   19 20 21 22 23 24 25
17 18 19 20 21 22 23   21 22 23 24 25 26 27   19 20 21 22 23 24 25

C:test>findstr  22  yr2010.doc | find /c /v **
4

C:test>

Vis,

Even though each line above  has three 22s ; only one is counted by findstr.

I do not know  to modify so findstr counts all strings.

**  use the double quote symbol above
Title: How to get the Count of string in file
Post by: victoria on August 12, 2010, 06:38:11 PM
Can we use FINDSTR

REM  Replace * with double quote
C:test>type   yr812.bat
@echo off
set /a  c=0
setlocal enabledelayedexpansion
for /f *tokens=1-26* %%a in (yr2010.doc)  do (

if *%%a*==*22* set /a c=!c! + 1
if *%%b*==*22* set /a c=!c! + 1
if *%%c*==*22* set /a c=!c! + 1
if *%%d*==*22* set /a c=!c! + 1
if *%%e*==*22* set /a c=!c! + 1
if *%%f*==*22* set /a c=!c! + 1
if *%%g*==*22* set /a c=!c! + 1
if *%%h*==*22* set /a c=!c! + 1
if *%%i*==*22* set /a c=!c! + 1
if *%%j*==*22* set /a c=!c! + 1
if *%%k*==*22* set /a c=!c! + 1
if *%%l*==*22* set /a c=!c! + 1
if *%%m*==*22* set /a c=!c! + 1
if *%%n*==*22* set /a c=!c! + 1
if *%%0*==*22* set /a c=!c! + 1
if *%%p*==*22* set /a c=!c! + 1
if *%%q*==*22* set /a c=!c! + 1
if *%%r*==*22* set /a c=!c! + 1
if *%%s*==*22* set /a c=!c! + 1
if *%%t*==*22* set /a c=!c! + 1
if *%%u*==*22* set /a c=!c! + 1
if *%%v*==*22* set /a c=!c! + 1
if *%%w*==*22* set /a c=!c! + 1
if *%%x*==*22* set /a c=!c! + 1
if *%%y*==*22* set /a c=!c! + 1
if *%%z*==*22* set /a c=!c! + 1
)
echo count=%c%

echo.
echo Display yr2010.doc
echo.

Output:

C:test>yr812.bat
count=12

Display yr2010.doc

C:test>
Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 07:23:00 PM
Can we use FINDSTR

Vis,
( Code has not been fully tested but my price is right.)
REM This generic batch string counter should work for most files and strings
Rem replace * with double quote symbol
Rem Usage:  cnt812.bat string  file.txt
REM Usage:  cnt812.bat  the  id.txt
C:test>type cnt812.bat
@echo off
set /a  c=0
setlocal enabledelayedexpansion
for /f *tokens=1-26* %%a in (%2)  do (

if *%%a*==*%1* set /a c=!c! + 1
if *%%b*==*%1* set /a c=!c! + 1
if *%%c*==*%1* set /a c=!c! + 1
if *%%d*==*%1* set /a c=!c! + 1
if *%%e*==*%1* set /a c=!c! + 1
if *%%f*==*%1* set /a c=!c! + 1
if *%%g*==*%1* set /a c=!c! + 1
if *%%h*==*%1* set /a c=!c! + 1
if *%%i*==*%1* set /a c=!c! + 1
if *%%j*==*%1* set /a c=!c! + 1
if *%%k*==*%1* set /a c=!c! + 1
if *%%l*==*%1* set /a c=!c! + 1
if *%%m*==*%1* set /a c=!c! + 1
if *%%n*==*%1* set /a c=!c! + 1
if *%%0*==*%1* set /a c=!c! + 1
if *%%p*==*%1* set /a c=!c! + 1
if *%%q*==*%1* set /a c=!c! + 1
if *%%r*==*%1* set /a c=!c! + 1
if *%%s*==*%1* set /a c=!c! + 1
if *%%t*==*%1* set /a c=!c! + 1
if *%%u*==*%1* set /a c=!c! + 1
if *%%v*==*%1* set /a c=!c! + 1
if *%%w*==*%1* set /a c=!c! + 1
if *%%x*==*%1* set /a c=!c! + 1
if *%%y*==*%1* set /a c=!c! + 1
if *%%z*==*%1* set /a c=!c! + 1
)
echo count=%c%

echo.
echo Display %2
echo.
type %2

Output:

C:test>cnt812.bat  the  id.txt
count=10

Display id.txt

the
the
the
the
the the the
the the the

C:test>
Title: Re: how to get the Count of string in file
Post by: victoria on August 12, 2010, 08:57:20 PM
Can we use FINDSTR


sed s/the/the*n/g id.txt  |  findstr the   | find /c /v **
count=10

* replace * with backslash symbol
**  replace ** with two double quotes

sed for windows is an easy download

sed  means stream editor
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 13, 2010, 12:46:11 AM
http://thesystemguard.com/NTCmdLib/Functions/SCOUNT.htm

Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 13, 2010, 01:47:06 PM
Code: [Select]
@echo off
 >substringcount.vbs echo  substring = wscript.arguments(0)
>>substringcount.vbs echo longstring = wscript.arguments(1)
>>substringcount.vbs echo    Subslen = Len(Substring)
>>substringcount.vbs echo    longlen = Len(longstring)
>>substringcount.vbs echo   Subcount = 0
>>substringcount.vbs echo   Substart = InStr ( longstring, Substring )
>>substringcount.vbs echo If Substart ^> 0 Then
>>substringcount.vbs echo Do
>>substringcount.vbs echo Subcount = Subcount + 1
>>substringcount.vbs echo longstring = Mid( longstring, ( Substart + Subslen ) )
>>substringcount.vbs echo Substart = InStr ( longstring,Substring )
>>substringcount.vbs echo If Substart = 0 Then Exit Do
>>substringcount.vbs echo Loop
>>substringcount.vbs echo End If
>>substringcount.vbs echo wscript.echo Subcount

set mainstring="arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~"
set substring="INS*"

for /f "delims=" %%C in ('cscript //nologo substringcount.vbs %substring% %mainstring%') do set count=%%C

echo Found string %substring% %count% times in string %mainstring%


Code: [Select]
S:\>test.bat
Found string "INS*" 2 times in string "arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~"
Title: Re: how to get the Count of string in file
Post by: victoria on August 13, 2010, 03:18:30 PM
Code: [Select]
@echo off
 

Code: [Select]
S:>test.bat
Found string \"INS*\" 2 times in \"string \"arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~\"

(http://i7.photobucket.com/albums/y268/billrich/ststr.jpg)


cntstr.bat  INS*  st813.txt

rem @echo  off

sed s/INS*/INS**n/g st813.txt   | findstr INS*  | find  /c /v  **
count=2

echo Display string
Display string


type st813.txt
arun*America*MSC~INS*dfffs*dfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
* replace * with backslash
** replace with double quotes

( see above )
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 13, 2010, 03:42:15 PM

sed s/the/the*n/g id.txt  |  findstr the   | find /c /v **
count=10

* replace * with backslash symbol
**  replace ** with two double quotes

sed for windows is an easy download

sed  means stream editor


Thanks for all the research work. I appreciate it.

You are really hard working.





I am thinking myself to be a beginner.
I think I must keep my mouth
 :P :P :P :-X :-X :-X :P :P :P

Thanks Victoria.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 13, 2010, 03:45:20 PM
vishuvishal: just out of curiousity, where are you from?  :)
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 13, 2010, 04:11:33 PM
Quote
Found string \"INS*\" 2 times in \"string \"arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~\"

Where are those back slashes coming from?
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 13, 2010, 04:12:12 PM
vishuvishal: just out of curiousity, where are you from?  :)

Not too far from Bill's trailer, I daresay.
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 13, 2010, 04:30:30 PM
Quote
Re: how to get the Count of string in file
« Reply #50 on: Today at 04:12:12 PM »    Reply with quote
Quote from: BC_Programmer on Today at 03:45:20 PM
vishuvishal: just out of curiousity, where are you from?  Smiley

Not too far from Bill's trailer, I daresay.

I am far from there.

I am somewhere from eastern side.
I hope you will hate me for this.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 13, 2010, 05:10:05 PM
I am far from there.
And yet in the same time zone?

Quote
I am somewhere from eastern side.
Eastern side of what?
Quote
I hope you will hate me for this.

um, ok.

Title: Re: how to get the Count of string in file
Post by: victoria on August 13, 2010, 05:14:09 PM
Where are those back slashes coming from?


I connect to Computerhope.com through a Proxy Server

The following is a guess about the orgin of random blackslashes:
The Editor at the Proxy Server posts my post here at Computerhope.com?
The Editor at the Proxy Server inserts the random backslashes?

Or  the staff here at computerhope.com sets their editor to insert random backslashes?

I do not know how to correct the problem.

Thanks for your help.


Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 13, 2010, 05:38:40 PM
I do not know how to correct the problem.

Go away. That'll fix it.
Title: Re: how to get the Count of string in file
Post by: victoria on August 13, 2010, 06:15:37 PM
Go away. That will  fix it.

What have I done to hurt anything?

Why should I leave?
Title: Re: how to get the Count of string in file
Post by: victoria on August 13, 2010, 06:34:20 PM

Thanks for all the research work. I appreciate it.

You are really hard working.


I enjoy trying to answer questions.
I believe sed is a very useful tool for many problems.
Sed was written by AT&T many years ago for the  Unix Operating System.
Sed is now used with many operating systems.

p.s. Ignore the negative comments by some of the other posters.
The people making the negative comments have a ton of good  information when they
choose to help.
Why do they have a need to insult people who came to Computerhope looking for help?

Good Luck
Title: Re: how to get the Count of string in file
Post by: victoria on August 13, 2010, 08:45:51 PM

Thanks for all the research work. I appreciate it.

(http://i7.photobucket.com/albums/y268/billrich/waldo.jpg)

Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 12:50:29 AM
And yet in the same time zone?
Eastern side of what?

Billrich's trailer? Billrich's head more like.

Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 01:08:04 AM
Why should I leave?

Because you were banned before and forbidden to return.
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 02:54:45 AM
FWIW, vbs tidied up...

Code: [Select]
@echo off

 >substringcount.vbs echo  substring = wscript.arguments (0)
>>substringcount.vbs echo longstring = wscript.arguments (1)
>>substringcount.vbs echo    Subslen = Len ( Substring )
>>substringcount.vbs echo   count = 0
>>substringcount.vbs echo Do
>>substringcount.vbs echo Substart = InStr ( longstring, Substring )
>>substringcount.vbs echo If Substart ^> 0 then count = count + 1
>>substringcount.vbs echo longstring = Mid ( longstring, ( Substart + Subslen ) )
>>substringcount.vbs echo Loop Until Substart = 0
>>substringcount.vbs echo wscript.echo count

set bigstring="arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~"
set substring="INS*"

for /f "delims=" %%C in ('cscript //nologo substringcount.vbs %substring% %bigstring%') do set count=%%C

echo Found string %substring% %count% times in string %bigstring%

del substringcount.vbs
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 03:22:12 AM
I wonder if Billrich ("Victoria") and Vishuvishal are one and the same person? If so he is laughing at us.
Title: How to get the Count of string in file
Post by: victoria on August 14, 2010, 08:44:22 AM
(http://i7.photobucket.com/albums/y268/billrich/fish.jpg)



C:test>type  cntstr.bat
@echo  off
sed s/%1/%1*n/g %2 |findstr %1| find  /c /v  **

echo Display string
echo.
type %2

Output:

C:test>cntstr.bat  INS*  bigstring.txt
count=2

Display string

arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

* replace * with a backslash
**  replace ** with double quotes
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 09:49:13 AM
http://i7.photobucket.com/albums/y268/billrich/fish.jpg
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 10:01:25 AM
Code: [Select]
@echo off
setlocal enabledelayedexpansion

 >substringcount.vbs echo  substring = wscript.arguments (0)
>>substringcount.vbs echo longstring = wscript.arguments (1)
>>substringcount.vbs echo    Subslen = Len ( Substring )
>>substringcount.vbs echo   count = 0
>>substringcount.vbs echo Do
>>substringcount.vbs echo Substart = InStr ( longstring, Substring )
>>substringcount.vbs echo If Substart ^> 0 then count = count + 1
>>substringcount.vbs echo longstring = Mid ( longstring, ( Substart + Subslen ) )
>>substringcount.vbs echo Loop Until Substart = 0
>>substringcount.vbs echo wscript.echo count

set substring=INS*
set infile=test.txt

set total=0
for /f "delims=" %%A in (test.txt) do (
     set bigstring=%%A
for /f "delims=" %%C in ( ' cscript //nologo substringcount.vbs "%substring%" "!bigstring!" ' ) do set /a total=!total!+%%C
)
del substringcount.vbs

echo Found string %substring% %total% times in file %infile%


Code: [Select]
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

Code: [Select]
Found string INS* 16 times in file test.txt


Title: Re: how to get the Count of string in file
Post by: victoria on August 14, 2010, 10:35:10 AM


C:test>type   cntstr.bat
@echo  off

sed s/%1/%1*n/g %2 |findstr %1| find  /c /v  **

echo Display string
echo.
type %2

Output:

C:test> cntstr.bat  INS*  test.txt
count=16

Display string


arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~
arun*America*MSC~INS*egggs*Segse*segse~ssgse*segse~INS*egggs*segseg*segs~
arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

C:test>

* replace * with backslash
** repace ** with double quotes
Title: Re: how to get the Count of string in file
Post by: victoria on August 14, 2010, 10:37:00 AM
http://i7.photobucket.com/albums/y268/billrich/fish.jpg

Use the image tag: [img]
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 10:44:21 AM
Bill, will you please quit trolling my posts? I posted the text of the link, without image tags, in order to display the presence of "billrich" in the link to the image hosted in your photobucket account, to show that "Victoria" is in fact the banned troll Billrich/marvinengland etc.

Unlike Bill's "solution", mine does not rely on 3rd party addons. And I think my code looks prettier. I always think SED commands look like lines from a Martian's shopping list. Maybe that's why Bill likes them so much?



Title: How to get the Count of string in file
Post by: victoria on August 14, 2010, 11:34:00 AM
Code: [Select]
c:test> gawk *{m=gsub(*INS*,**); total+=m}END{print *total:* total}* file

Ghostdog,

Some of the  members here at computerhope.com believe we should not use gawk and sed because the vbs script and batch  look better.
Title: Re: How to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 11:51:01 AM
Ghostdog,

Some of the  members here at computerhope.com believe we should not use gawk and sed because the vbs script and batch  look better.

I sure hope they work out a way of permanently banning you.



Title: Re: How to get the Count of string in file
Post by: victoria on August 14, 2010, 02:22:19 PM
I sure hope they work out a way of permanently banning you.


 Which post is offensive?  I have done nothing wrong.
Title: Re: How to get the Count of string in file
Post by: Salmon Trout on August 14, 2010, 02:37:51 PM
Which post is offensive?

All of them.

Title: Re: how to get the Count of string in file
Post by: kpac on August 14, 2010, 02:50:16 PM
Quote
I have done nothing wrong.
Circumventing a ban is enough.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 14, 2010, 04:15:08 PM
here's my Counting function, VBS:



Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor) As Long
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, ""))) / Len(SearchFor)
End Function


dim inputstrm
Dim lookin,lookfor
Set inputstrm = CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(0))
lookfor = WScript.Arguments(1)
lookin=inputstrm.ReadAll()
WScript.Echo GetCountStr(lookin,lookfor)

I've tested the function, but not the code using it.
Title: Re: how to get the Count of string in file
Post by: victoria on August 14, 2010, 04:39:19 PM

Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor) As Long
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, **))) / Len(SearchFor)
End Function


dim inputstrm
Dim lookin,lookfor
Set inputstrm = CreateObject(*Scripting.FileSystemObject*).OpenTextFile(WScript.Arguments(0))
lookfor = WScript.Arguments(1)
lookin=inputstrm.ReadAll()
WScript.Echo GetCountStr(lookin,lookfor)

I have tested the function, but not the code using it.

That is great, now test all the code and show the count for how many times a string appears in a file.

Look at Sidewinders code in reply 6 for how to do this.

Good Luck

Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 14, 2010, 05:04:08 PM
It works, I had to remove the accidental Type declaration still left, since the "original" is VB6:

Also, Added case insensitive option "/i".

Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
    CompareText=CBool(CompareText)
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
End Function


dim inputstrm
Dim lookin,lookfor

'see if /i was specified....
for each looparg in WScript.Arguments
    If UCase(looparg)="-I" or UCase(looparg)="/I" Then
       ignorecase=true
       Exit For
    End If
Next

Set inputstrm = CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(0))
lookfor = WScript.Arguments(1)
lookin=inputstrm.ReadAll()
WScript.Echo GetCountStr(lookin,lookfor,ignorecase)

I shall now endeavour to emulate the ridiculous manner in which Bill tests his code. I will refrain from the classic posting of the output from dir /? for no reason though.

test "input" file, "zwicky.txt":

Quote
in the 1930s and 1940s, many of Fritz Zwicky's colleagues regarded him as an irritating buffoon. Future generations of astronomers would look back on him as a creative genius.
    "By the time I knew Fritz in 1953, he was thoroughly convinced that he had the inside track to ultimate knowledge, and that everyone else was wrong," says William Fowler, then a student at Caltech (The Californian Institute of Technology) where Zwicky taught and did research. Jesse Greenstein, a Caltech colleague of Zwicky's from the late 1940's onward, recalls Zwicky as "a self-proclaimed genius... There's no doubt that he had a mind which was quite extraordinary, But he was also, although he didn't admit it, untutored and not self-controlled.
... HE taught a course in physics for which the admission was at his pleasure. If he thought that a person was sufficiently devoted to his ideas, that person could be admitted... He was very much alone [ among the Caltech physics faculty, and was] not popular with the establishment... His publications often included violent attacks on other people."

Zwicky-- a stocky, cocky man, always ready for a fight -- did not hesitate to proclaim his inside track to ultimate knowledge, or to tout the revelations it brought. In lecture after lecture during the 1930s, and article after published article, he trumpeted the concept of a neutron star-- a concept that he, Zwicky, had invented to explain the origins of the most energetic phenomena seen by astronomers: supernovae, and cosmic rays. He even went on the air in a nationally broadcast radio show to popularize his neutron stars. But under close scrutiny, his articles and lectures were unconvincing. They contained little substantiation for his ideas.

It was rumoured that Robert Millikan (the man who had built Caltech into a powerhouse among science institutions), when asked in the midst of all this hoopla why he kept Zwicky at Caltech, replied that it just might turn out that some of Zwicky's far-out ideas were right. Millikan, unlike some others in the science establishment, must have seen hints of Zwicky's intuitive genius - a genius that became widely recognized only thirty five years later, when observational astronomers discovered real neutron stars in the sky and verified some of Zwicky's extravagant claims about them.

Code: [Select]
D:\>Cscript /NOLOGO countstr.vbs zwicky.txt caltech /i
5

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt establishment
2

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt Establishment
0

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt Establishment /i
2

D:\>Cscript /NOLOGO countstr.vbs zwicky.txt zwicky /i
10

D:\>
Title: How to get the Count of string in file
Post by: victoria on August 14, 2010, 05:42:50 PM

C:test>type cntstr.bat
@echo  off

sed s/%1/%1*n/g %2 |findstr %1| find  /c /v  **

echo.
rem type %2

Output:

C:test>cntstr.bat  Zwicky  zwicky.txt
count=10


C:\\test>cntstr.bat  Caltech  zwicky.txt
count=5

C:\\test>cntstr.bat  establishment  zwicky.txt
count=2

C:test>

*  replace * with backslash
** replace ** with double quotes
Title: How to get the Count of string in file
Post by: victoria on August 14, 2010, 08:46:07 PM
( The following batch code with tokens found the right count. But I had to massage the input file.  Someone with more token experience might correct the code? Thanks)
*  replace * with a double quote.

C:test>type try813.bat
@echo off
set /a  c=0
setlocal enabledelayedexpansion
for /f *tokens=1-26* %%a in (%2)  do (
if *%%a*==*%1* set /a c=!c! + 1
if *%%b*==*%1* set /a c=!c! + 1
if *%%c*==*%1* set /a c=!c! + 1
if *%%d*==*%1* set /a c=!c! + 1
if *%%e*==*%1* set /a c=!c! + 1
if *%%f*==*%1* set /a c=!c! + 1
if *%%g*==*%1* set /a c=!c! + 1
if *%%h*==*%1* set /a c=!c! + 1
if *%%i*==*%1* set /a c=!c! + 1
if *%%j*==*%1* set /a c=!c! + 1
if *%%k*==*%1* set /a c=!c! + 1
if *%%l*==*%1* set /a c=!c! + 1
if *%%m*==*%1* set /a c=!c! + 1
if *%%n*==*%1* set /a c=!c! + 1
if *%%o*==*%1* set /a c=!c! + 1
if *%%p*==*%1* set /a c=!c! + 1
if *%%q*==*%1* set /a c=!c! + 1
if *%%r*==*%1* set /a c=!c! + 1
if *%%s*==*%1* set /a c=!c! + 1
if *%%t*==*%1* set /a c=!c! + 1
if *%%u*==*%1* set /a c=!c! + 1
if *%%v*==*%1* set /a c=!c! + 1
if *%%w*==*%1* set /a c=!c! + 1
if *%%x*==*%1* set /a c=!c! + 1
if *%%y*==*%1* set /a c=!c! + 1
if *%%z*==*%1* set /a c=!c! + 1
)
echo count=%c%
echo Display %2
rem type %2
Output:
C:test> try813.bat  Zwicky  zwicky.txt
count=10
Title: Re: How to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 01:07:39 AM
Ghostdog,

Some of the  members here at computerhope.com believe we should not use gawk and sed because the vbs script and batch  look better.
this is the biggest joke of the year. awk/sed is excellent for parsing files and modifying it. Awk is also a little programming language capable of replacing cmd.exe. batch/vbscript look better? better in what sense? more lines of code means better? my gawk statement takes only 1 line, and it saves me enough time to go onto my other assignments. While you have to crack your head and come up with long and messy batch files like the last one you posted. By the time you finished, i am already off to bed and enjoying my sleep.
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 15, 2010, 01:22:38 AM
Quote
better in what sense?

More readable by others.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 01:33:06 AM
More readable by others.

vbscript maybe, but definitely not batch.
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 15, 2010, 01:42:17 AM
vbscript maybe, but definitely not batch.

I have to agree with you there. When I post one of those batch "solutions" where the batch file writes a vbscript on the fly, calls it, and then deletes the vbs, I get an uneasy feeling, like a surgeon advising somebody, when removing a gall stone with a carpenter's saw, to attach a scalpel blade to it with duct tape.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 02:29:50 AM
I have to agree with you there. When I post one of those batch "solutions" where the batch file writes a vbscript on the fly, calls it, and then deletes the vbs, I get an uneasy feeling, like a surgeon advising somebody, when removing a gall stone with a carpenter's saw, to attach a scalpel blade to it with duct tape.

I always recommend not to do hybrids, ie combining batch+vbscript. Mostly due to my own experiences, i find it difficult to read and troubleshoot due to intermixing of different syntaxes, etc. vbscript can do what batch does so I myself would write in entire in vbscript. Anyway, this is OT already...so ...
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 15, 2010, 09:04:13 AM
Project Gutenberg has some  books in text file format. My code seems woefully slow compared to BC_Programmer's. Although either script counted "God" in the King James Bible in less than half a second. but see below...

Salmon-count.vbs

This is how I am going to try to do VBscripts in future...

Code: [Select]
Option Explicit

'Setup
Dim ObjFSO
Dim ObjTS
Dim StrFileName
Dim StrLookString
Dim StrThisline

Dim SngStartSec
Dim SngEndSec
Dim SngElapsed
Dim SngLineCount
Dim SngTotalCount
Dim SngSubsLen
Dim SngSubStart
Dim SngCaseSensitive

'Input filename
StrFileName=Wscript.Arguments(0)

'String to search for
StrLookString=Wscript.Arguments(1)

'Case type - 1 = case sensitive 0 = case insensitive
SngCaseSensitive = Wscript.Arguments (2)

'Length of string to search for
SngSubsLen = Len (StrLookString)

'if case insensitive search
'convert to lower case
If SngCaseSensitive = 0 Then StrLookString = LCase(StrLookString)

'Initialise File System Object
Set ObjFSO=Createobject("Scripting.Filesystemobject")

'Open input file
Set ObjTS=ObjFSO.Opentextfile(StrFileName)

'Store start time (secs since midnight)
SngStartSec = Timer

'Keep reading lines until all done
Do While Not ObjTS.Atendofstream
'Get line
StrThisLine=ObjTS.Readline
'if case insensitive search
'convert to lower case
If SngCaseSensitive = 0 Then StrThisLine=LCase(StrThisLine)
'Set count to zero
SngLineCount = 0
Do
'Is string in line? If so, get place
SngSubStart = InStr ( StrThisLine, StrLookString )
'If found, add 1 to counter
If SngSubStart > 0 then SngLineCount = SngLineCount + 1
'If found, chop off string before
StrThisLine = Mid ( StrThisLine, ( SngSubstart + SngSubsLen ) )
'Exit when no more found
Loop Until SngSubstart = 0
'Add count from this line to total
SngTotalCount = SngTotalCount + SngLineCount
Loop

'Close input file
ObjTS.Close

'Store end time (secs since midnight)
SngEndSec = Timer

'Subtract to get elapsed
SngElapsed = SngEndsec - SngStartSec

'Show results
wscript.echo SngTotalCount
wscript.echo formatnumber(SngElapsed,3)


BCP_count.vbs

Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
    CompareText=CBool(CompareText)
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
End Function


dim inputstrm
Dim lookin,lookfor

Dim StartSec, Endsec, Elapsed

'see if /i was specified....
for each looparg in WScript.Arguments
    If UCase(looparg)="-I" or UCase(looparg)="/I" Then
       ignorecase=true
       Exit For
    End If
Next

Startsec=Timer
Set inputstrm = CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(0))
lookfor = WScript.Arguments(1)
lookin=inputstrm.ReadAll()

Endsec=Timer
Elapsed=Endsec - Startsec
WScript.Echo GetCountStr(lookin,lookfor,ignorecase)
wscript.echo Formatnumber(Elapsed, 3)


Code: [Select]
Salmon-count.vbs "H G Wells The War Of The Worlds.txt" "Martians" 1
156
0.043

BCP-count.vbs "H G Wells The War Of The Worlds.txt" "Martians"
156
0.016

Salmon-count.vbs "Complete Works Of Shakespeare.txt" "Hamlet" 1
113
0.688

BCP-count.vbs "Complete Works Of Shakespeare.txt" "Hamlet"
113
0.250

Salmon-count.vbs "Tolstoy War And Peace.txt" "Pierre" 1
1963
0.383

BCP-count.vbs "Tolstoy War And Peace.txt" "Pierre"
1963
0.145

Salmon-count.vbs "King James Bible.txt" "God" 1
4167
0.359

BCP-count.vbs "King James Bible.txt" "God"
4167
0.188

Salmon-count.vbs "Samuel Richardson Clarissa.txt" "she" 1
8861
1.156

bcp-count.vbs "Samuel Richardson Clarissa.txt" "she"
8861
0.234

but...

I downloaded a text file containing 1 million places of pi (1,000,000,002 bytes) with no carriage returns. I figured that my code wouldn't like that, so I used GNU fold to insert cr/lf pairs every 80 columns. However, when I tried BCP's code on it, oh dear! The system got awfully sluggish and I watched my available RAM go down from 3.2 GB to 24 MB before I used Process Explorer to terminate cscript.exe. But my "slow" code just chewed its way through in 1 minute 44 and a bit seconds...

Code: [Select]
salmon-count.vbs "1 billion places of pi.txt" "567" 0
975498
104.430

Code: [Select]
      351,218 H G Wells The War Of The Worlds.txt
    3,288,738 Tolstoy War And Peace.txt
    4,397,206 King James Bible.txt
    5,582,655 Complete Works Of Shakespeare.txt
    5,616,676 Samuel Richardson Clarissa.txt
1,025,000,002 1 billion places of pi.txt

System:

Shuttle SN78SH7, AMD Phenom II 945 (quad core), 4 GB Crucial 800 MHz DDR2 RAM,  Windows 7 64 bit, files read from Seagate 320GB external USB 2.0 drive.






Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 09:54:15 AM
Most probably due to the readall(). BCP's code reads the whole file into memory. If your 1 billion pi (or is it 1 million? ) text file size is very big, then that explains the sluggishly of fitting all into memory.
Title: How to get the Count of string in file
Post by: victoria on August 15, 2010, 10:16:42 AM


arun*America*MSC~INS*dfffs*Sdfsd*sdfsd~ssfsd*sdfsd~INS*dfffs*sdfsdf*sdfs~

I need to get a count of INS* in the above file. Am new to DOS Commands.


Arunavlp,

The posts at the end of your thread and in the middle are so far off topic that
the posters must start a new thread for their strange ideas.


Im pleased that your problem of counting how often a string appears in file has been answered several times.

The Sed solution is the best solution.
Title: Re: how to get the Count of string in file
Post by: Sidewinder on August 15, 2010, 11:06:33 AM
Quote
The Sed solution is the best solution.

Well, that's certainly the definitive answer.

Actually I still like my response back in post 6. By using the replace function to insert nulls in place of the all the occurrences of the search argument, the original string is effectively shortened (nulls have zero length). Using some 3rd grade arithmetic, you can calculate the difference in lengths between the original string and the replacement string. This gives the number of nulls that were added to the file. Dividing by the length of the search argument. the result is the number of occurrences of the substring in the original string.

Powershell can do this as a one liner more readable than SED. There is always more than one solution to any coding problem. Makes me wonder why many posters request a specific type solution.

 8)
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 15, 2010, 11:55:39 AM
If your 1 billion pi (or is it 1 million? ) text file size is very big

One thousand and twenty-five thousand million and two bytes (1,025,000,002) as I posted above.

Quote
then that explains the sluggishly of fitting all into memory.

Did I imply that I did not already realise this?


Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 15, 2010, 11:58:33 AM
There is always more than one solution to any coding problem. Makes me wonder why many posters request a specific type solution.

Unlike hobbyists at home using their own systems, many people asking for help have already partially completed a script and /or are using employer's computers on which restrictions are in place preventing installation of 3rd party software.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 15, 2010, 01:04:46 PM
I just sort of threw mine together, wasn't too interested in making sure it worked for gigantic files :P

Here's a version that reads in chunks instead:

Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
    CompareText=CBool(CompareText)
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
End Function


dim inputstrm
Dim lookin,lookfor

'see if /i was specified....
for each looparg in WScript.Arguments
    If UCase(looparg)="-I" or UCase(looparg)="/I" Then
       ignorecase=true
       Exit For
    End If
Next
set FSO=CreateObject("Scripting.FileSystemObject")
set FileOpen = FSO.GetFile(WScript.Arguments(0))

'read in chunks of 32K:
chunksize = 32*1024
numchunks = FileOpen.Size \ (chunksize)
remainder = fileopen.Size mod (chunksize)


Set inputstrm = FSO.OpenTextFile(WScript.Arguments(0))
lookfor = WScript.Arguments(1)
Strhangoff=""
Do Until(inputstrm.AtEndOfStream)
 readchunk = strhangoff + inputstrm.Read(chunksize)
 RunnerCount=RunnerCount + GetCountStr(readchunk,lookfor,ignorecase)
 
 
 
 strhangoff = right(readchunk,len(lookfor)-2)   '-1 on the length so we don't grab the entire thing
 'if it happens to be exactly on the end of the string, so nothing is counted twice.
 


Loop


WScript.Echo RunnerCount

I don't actually have a super extra large file to test it on, so I made one by duplicating the "zwicky.txt" file over itself several hundred times.

This one is certainly faster then the ReadAll() method idea. I've added a small provision so that it doesn't "miss" entries by reading half of the string at the end of a chunk and the rest on the next chunk (thereby not finding it) by copying a "hangoff" at the end of the previous chunk to the start of the next chunk. I make sure the chunk is shorter then the search string itself by one character, this prevents finding of the string twice in the edge case where it is found at the <very> end of a chunk (which otherwise would be counted twice- once in the first chunk and once in the next chunk in which it would be copied to).

The main difficulty was getting used to the <Microsoft> FileSystemObjects- I'm used to using my own library. Not sure if there would be much of a speed difference, there, but it's what I'm used to (not counting the .NET IO namespace). the interesting thing is that the only differences are method names (I chose "Eof" rather then "AtEndOfStream" to indicate the stream was at the end of the file) and of course ProgIDs, everything else could be pretty much exactly the same.


Quote
Actually I still like my response back in post 6. By using the replace  function to insert nulls in place of the all the occurrences of the search argument, the original string is effectively shortened (nulls have zero length). Using some 3rd grade arithmetic, you can calculate the difference in lengths between the original string and the replacement string. This gives the number of nulls that were added to the file. Dividing by the length of the search argument. the result is the number of occurrences of the substring in the original string.

Actually, your idea is pretty much a slight variant of what my routine does:

Code: [Select]
Function GetCountStr(ByVal searchIn, ByVal SearchFor,Byval CompareText)
    CompareText=CBool(CompareText)
    GetCountStr = (Len(searchIn) - Len(Replace(searchIn, SearchFor, "",1,-1,abs(CompareText)))) / Len(SearchFor)
End Function

It replaces the text being searched for with an empty string, and then does the math. It's actually easier to do it this way rather then replacing it with null, since the size difference between the original and the "replaced" version will be off by an exact multiple of the length of the string to search for. I wrote this a good few years ago, and had to "translate" from the VB6 it was written in to VBS.

Powershell can do this as a one liner more readable than SED. There is always more than one solution to any coding problem. Makes me wonder why many posters request a specific type solution.

It could be a one-liner in VBScript, but it would be both hard to read and somewhat silly. And it would require ReadAll() again.

Using .NET 4.0/ C# 4.0 it might even be possible to read in a number of chunks at once and then "count" the occurrences of each one in parallel using the Parallel For construct. The same would be possible in 3.5 but would require the manual spinning of said threads and less then enviable use of locks to prevent resource contention. It's pretty interesting that such a simple problem can have such varied solutions, but not at all surprising.


Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 15, 2010, 05:45:55 PM
He he...
I hope this is form for dos.
Not for VBS or VB or C

Don't mind it.
But, I started liking batch programming.
I really appreciate your knowledge of expertise.
As I think windows functionality can be operated from dos. Cause window itself is dos operated operating system.
So, I think you must count on batch. Rather than other languages.


If I said anything dis-hearting the integrity of any programmer. I really apologize for that.
I didn't mean that way.
But, can you point which is the best IDE for the season.
Like, C is the best language.

comment appreciated.

I know this is going off topic.



Thanks and regards.
Vishu
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 06:11:30 PM
One thousand and twenty-five thousand million and two bytes (1,025,000,002) as I posted above.
so its 1 million (but your filename passed to your vbscript states 1 billion. )
Quote
Did I imply that I did not already realise this?
appears to me. You showed a benchmark between BCP and your code, then says BCP's one is sluggish after a while without stating your reasons and conclusion of your findings. Makes one wonder why it happens right?
Title: Re: how to get the Count of string in file
Post by: vishuvishal on August 15, 2010, 06:15:13 PM
so its 1 million (but your filename passed to your vbscript states 1 billion. )appears to me. You showed a benchmark between BCP and your code, then says BCP's one is sluggish after a while without stating your reasons and conclusion of your findings. Makes one wonder why it happens right?


Really don't know what you talking about.
Title: Re: How to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 06:17:40 PM
The Sed solution is the best solution.
not really! If its a big file, using your method of substituting the word to include newlines, (which is expensive compared to pure string counting) , and then piping to 2 calls of find command to find the count is not the best way to go. The best way is to count the number of words found AS YOU ITERATE THE FILE (with whatever tool that is processing it) and put the count in memory. That said, sed is not the best tool to use in this case.

Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 06:18:37 PM

Really don't know what you talking about.
sorry i don't care if you know or not. My words are not for you.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 15, 2010, 06:19:59 PM
Quote
One thousand and twenty-five thousand million and two bytes (1,025,000,002)
so its 1 million (but your filename passed to your vbscript states 1 billion. )

a Billion is a thousand millions... (In North America, at least)


Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 06:25:33 PM
so its 1 million (but your filename passed to your vbscript states 1 billion. )

a Billion is a thousand millions... (In North America, at least)

ok ok. But i am talking about post #83. where ST said he download "1 million places of pi", then his file name for testing the benchmark is "1 billion places of pi". He is showing a benchmark, and when there are ambiguities, its only natural for the inquisitive mind to ask questions.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 15, 2010, 06:31:11 PM
ok ok. But i am talking about post #83. where ST said he download "1 million places of pi", then his file name for testing the benchmark is "1 billion places of pi". He is showing a benchmark, and when there are ambiguities, someone like me will question.

Doesn't much matter if it's a billion or a million, as long as the same inputs were used to test both- the exact size is more a curiousity (except in some cases).
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 06:37:35 PM
a billion and a million is different.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 06:42:03 PM
Two \\ should be one

C:\\test>type   cntstr.bat
rem @echo  off
sed s/%1/%1\\n/g %2 | egrep -c %1

C:\\test>cntstr.bat  the yz.txt

C:\\test>rem @echo  off

C:\\test>sed s/the/the\\n/g yz.txt   | egrep -c the
10

C:\\test>type yz.txt
the
the
the
the
the the the
the the the

this example will also count words like thesis,  stethescope, etc, which is not exactly the word "the". egrep is also deprecated. Use grep -E
Code: [Select]
grep -Eo "\bthe\b" file|wc -l
the above does not need to do substitution on the entire file and gets the exact string.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 15, 2010, 06:54:07 PM
a billion and a million is different.

Not in this case. What difference would it have on the results? sure, the numbers will be larger for a billion then for a million, but it's not the actual number that's important, it's how the two numbers compare.

ST performed two tests: one with a smaller file, and one with a larger file. the two tests revealed that with a larger amount of data to read, my method causes a large IO bottleneck. Two points of reference is enough for a crude line-chart comparison of the two, and while it may not be entirely accurate, it can reveal specific trends in the two functions. For example, we can determine that my routine seems to run at something like O((n/4)^2), whereas his is a more linear method whose time taken is linearly related to the length of the file. In mine, this is not the case because additional overhead is required for the system to properly manage the larger amount of memory being used to store the entire string.

What is important here is that we are comparing the programs used, As long as the inputs are the same the comparisons are valid.

if you test program A and Program B with Input C, it's a fair comparison between A and B as long as C is the same for both.

It doesn't matter if there was a mixup over the specifics of the size of C. The comparison was between A and B.

If you compare a Quick Sort with a Merge Sort,  wether you are testing with a million or a billion elements is largely redundant; what's important is the comparison. If there was confusion over the layout of the data (such as how a quicksort takes longer then a merge sort with a nearly sorted array) and it was relevant, then yes, I would agree. but while there is indeed some ambiguity, it's irrelevant.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 07:04:58 PM
Not in this case. What difference would it have on the results? sure, the numbers will be larger for a billion then for a million, but it's not the actual number that's important, it's how the two numbers compare.
If its a larger file, then your method of slurping all into memory is not a good solution. That's the difference. why do you say its not important? If the test files are like 1 thousand vs 100 , then of course your method will work. Size of the test samples do matter when doing benchmarks as it will affect the design of the algorithm being used.
Title: Re: how to get the Count of string in file
Post by: victoria on August 15, 2010, 07:52:35 PM
this example will also count words like thesis,  stethescope, etc, which is not exactly the word the egrep is also deprecated. Use grep -E
Code: [Select]
grep -Eo bthe file|wc -l
the above does not need to do substitution on the entire file and gets the exact string.

Ghost,
Your grep works. I had an old 2005 version.

Your skill level has improved. Who is your Tutor?

C:test>grep  -Eo the  yz.txt
the
the
the
the
the
the
the
the
the
the

C:test>grep  -Eo the  yz.txt  |  wc -l
      10

C:test>type yz.txt
the
the
the
the
the the the
the the the
C:test>
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 15, 2010, 08:04:13 PM
If its a larger file, then your method of slurping all into memory is not a good solution. That's the difference. why do you say its not important? If the test files are like 1 thousand vs 100 , then of course your method will work. Size of the test samples do matter when doing benchmarks as it will affect the design of the algorithm being used.

Reread my post.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 08:54:59 PM
Reread my post.

I reiterate my point. Size of a file does not matter if what you are comparing is the result of the output between to 2 pieces of code. That is, you want to make sure the output produced by the 2 pieces of code are the same. Size of file does matter in a benchmark, when you are concerned about the way the program is written and the algorithm used. That's is whether you have use the most optimized method when dealing with big files.

Because of the size of the file, you have chosen to read the files in chunks. That's a direct consequence of taking size into consideration when designing your program. That's why size does matter in a benchmark. 1 million is way different 1 billion!

Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 15, 2010, 08:58:04 PM
Ghost,
Your grep works
of course. Its better than using sed, which you proclaim is the "best".

Quote
. I had an old 2005 version.
time to change. We are not living in olden times anymore

Quote
Your skill level has improved. Who is your Tutor?
i have been playing with *nix since ancient times. my tutor is greg and bill rich, now its you and vishu...
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 15, 2010, 09:10:35 PM
I reiterate my point. Size of a file does not matter if what you are comparing is the result of the output between to 2 pieces of code.
That's what I said.

Quote
you want to make sure the output produced by the 2 pieces of code are the same.
Agree.

Quote
Size of file does matter in a benchmark, when you are concerned about the way the program is written and the algorithm used. That's is whether you have use the most optimized method when dealing with big files.
 

Yes, it does. but only if you perform a <single> benchmark. ST did two. therefore there are two points of reference and as I noted a linear formula can be derived from those two data points that roughly approximates a short range of the values of whatever the actual relationship between them are. A third data point will be enough to create a parabola, but, that doesn't mean that the performance relationship is a parabola, it's just all you can do with 3 points. It could very well be a cubic function of the size of the input.

The thing is, here, we can <SEE> the code. we can see why, right off, a larger file would make a difference. It doesn't matter if that larger file is larger by a million or a billion bytes, it's still larger and that difference is reflected thusly in the timings, and the reason is rather obvious as partially evidenced by your quick mention of it.


Quote
Because of the size of the file, you have chosen to read the files in chunks.
That's a direct consequence of taking size into consideration when designing your program. That's why size does matter in a benchmark. 1 million is way different 1 billion!

Oh, yes, of course, because everybody knows that you can't read in chunks for both 1 million and 1 billion. I obviously specifically designed it for the exact size that ST gave, I was in no way trying to make it more generic and efficient for smaller files (which it is, even a 128K file will benefit from chunk reading because it causes less stress on the task allocator and also causes less process memory fragmentation).

If you want to get right down to it, all benchmarks are flawed because of the timing code, it changes the results by being there, but you can't get results without it. the difference is that that benchmark code surrounds all the different timed blocks and therefore that fact can be ignored in the results.


I will agree that there are certainly instances where a million and a billion are a significant difference algorithm-wise, but at the same time, is not even the slightest floating point error in an algorithm a huge difference when it comes to the algorithms for trigonometric functions? It's a matter of the goal of the code in question as to exactly what constitutes a significant difference. In this case, because essentially ST was testing a large file (that was all I considered, I wasn't making sweeping design changes based on the fact that it was in millions as opposed to billions, but rather generic changes where it won't matter wether it was a million or a billion. Will the timing be different for a billion and a million? Of course it will. And I will agree that in that sense, the results are flawed. But you assume that my changes are based on his results, when in fact they are merely based on the simple premise that it doesn't work properly for large files. I didn't pay very close attention to the specific timings of them, because all I needed to know was that it was slower with larger files. I didn't need to know how many milliseconds it took to process with X characters.
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 16, 2010, 12:14:19 AM
You showed a benchmark between BCP and your code, then says BCP's one is sluggish after a while without stating your reasons and conclusion of your findings.

I assumed that it was obvious. Sorry you missed it.
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 16, 2010, 12:20:38 AM
ST said he download "1 million places of pi", then his file name for testing the benchmark is "1 billion places of pi".

That is absolutely true, I did, but I did then give the file size immediately after.

Quote
(1,000,000,002 bytes) with no carriage returns.

A billion places of decimals? No CRs? One byte per character? One each for the 3 and the decimal point, and 1,000,000,000 for the decimal places.


Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 16, 2010, 01:30:17 AM
That is absolutely true, I did, but I did then give the file size immediately after.
so now why don't you go edit your post and correct the typo? change million to billion.
Title: Re: how to get the Count of string in file
Post by: BC_Programmer on August 16, 2010, 01:48:05 AM
so now why don't you go edit your post and correct the typo? change million to billion.

A:) because he can't

and B:) it doesn't really matter.

I mean, come on:

Quote
I downloaded a text file containing 1 million places of pi (1,000,000,002 bytes)

It doesn't take a rocket scientist to see that the bracketed value is in fact 1 billion and 2. Just because this confused you doesn't make it ambiguous, especially since it was later referenced as a billion. In fact it is only noted as a million in the single quoted passage. The fact that you are now throwing up a shitestorm because of a obvious typo that is in no way ambiguous (it's clearly a billion, especially, you know, given the file size is a billion)
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 16, 2010, 03:01:31 AM
A:) because he can't
why can't?
Quote
and B:) it doesn't really matter.
yes it does. Especially when you are proofing something. A typo is a typo and it should be corrected. If not, its not clear
someone might think he meant a million and all his billions are wrong. isn't that so?
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 16, 2010, 03:32:19 AM
so shall we go through GD74's posts looking for typos? The spirit of Billrich has deeply impregnated this thread.
Title: Re: how to get the Count of string in file
Post by: ghostdog74 on August 16, 2010, 04:17:55 AM
so shall we go through GD74's posts looking for typos?
go ahead if you are too bored. I don't care. If there are anything i am trying to proof and there are typos , i would be glad to amend it.
Quote
The spirit of Billrich has deeply impregnated this thread
don't associate me with that guy.  If you want to do that, look at yourself in the mirror and tell me why you are any different
Title: Re: how to get the Count of string in file
Post by: Salmon Trout on August 16, 2010, 04:55:04 AM
go ahead if you are too bored. I don't care. If there are anything i am trying to proof and there are typos , i would be glad to amend it. don't associate me with that guy.  If you want to do that, look at yourself in the mirror and tell me why you are any different

offensive; reported
Title: Re: how to get the Count of string in file
Post by: CBMatt on August 16, 2010, 05:03:29 AM
Not quite sure what is so offensive about the comment in question, but this discussion has obviously gone far beyond the original intent of the thread.  In the future, a bit more maturity and less arguing would be preferred.  Topic locked.