Computer Hope

Microsoft => Microsoft DOS => Topic started by: petreli on October 09, 2009, 10:01:12 AM

Title: Bat or VBS file to count a specific word in multiple word docs
Post by: petreli on October 09, 2009, 10:01:12 AM
Hi

I am looking for a code to count the number of times a specific word has been used in multiple word docs, and return the figure in a text file.

Ie

Scan through several MS word docs, count the word "apple" and return the figure on a text file.

Is this possible without having to open each document individually?
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 12:40:28 PM
So many bright, helpful people at Computer Hope.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: petreli on October 09, 2009, 01:01:00 PM
thanks bill rich

but not sure how the code should be entered

can u post it as it should be entered, so i can copy and paste?
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 02:07:15 PM
petreli,

Ask any of the Computer Hope Crew with  2 and 3 thousand posts.

They are the experts.

Good luck

p.s.  Salmon Trout is new but he knows more than anyone. Must be from Canada?
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 02:10:02 PM
I don't know VBS.  Ask Ghost.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: gh0std0g74 on October 09, 2009, 05:38:31 PM
@OP, first of all, what bill uses is grep and wc which are originally *nix tools. there are windows ports of those tools around , so if you did not download them, you can't use it like what bill did.

secondly, MS words doc are binary files. therefore, you can forget about using that batch bill has written. it doesn't work that way. what you have to do , learn vbscript (or Powershell). See here (http://gallery.technet.microsoft.com/ScriptCenter/en-us/9fec9f03-e1a3-4e57-8e9b-5ecfb28204c0) for example.

In unix world, I use antiword
Code: [Select]
# antiword test.doc| awk '/{for(i=1;i<=NF;i++){if($i~/word/){c++}}}END{print "total count:"c}'
total count:1

There is windows port around, see here (http://www.informatik.uni-frankfurt.de/~markus/antiword). I haven't tried it, so i leave it to you to explore
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 06:48:43 PM
Read the posts by Ghost.  He has the solution.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: gh0std0g74 on October 09, 2009, 07:50:10 PM
petreli,

The following suggestion to change a Blaney.doc  to Blaney.txt  might help.
Be very careful. I would suggest a copy and not a rename( ren ).
copy Blaney.doc Blaney.txt.

The Batch find command seems to work ok:


C:\>find /i /c "City"  Blaney.txt

---------- BLANEY.TXT: 2

C:\>find /i  "City"  Blaney.txt |  wc -l
       4

C:\>

Reference:

Warning: Be very careful and ask for further advise.

Use the Windows Search first to find the word you are after and test only one or two Word Document files.  Good Luck
http://www.pcworld.idg.com.au/forum/290085/search_text_word_dos_documents


Thu, 05/04/2007 - 18:39
Re: Search text in Word for DOS documents
Easy fix. These are actually text files rather than docs in the current sense. If you rename them to .txt rather than .doc you will find that the properties of the files should be unchanged, but surprise surprise XP will now happliy search them. Change a couple to .txt and open in notepad or wordpad to confirm no damage is done. Of course to be really safe, just copy the lot into another directory and change the extensions en masse. To change the lot to .txt all at once, open a command prompt, navigate to the directory containing the files and type ren *.doc *.txt
Enjoy
Chris B

there's no guarantee this method will find the exact count of the words. the usage of find /c means find will count only the lines that contain the word. It would not count accurately when there are several of the same words in one line.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 08:12:52 PM
I was wrong; I'm sorry I took up your time.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: gh0std0g74 on October 09, 2009, 09:09:54 PM
I could not get the VBS code you had a link to. . . to work.
show why it doesn't work next time. error messages, your code etc....

a stripped down version
Code: [Select]
Set objWord = CreateObject("Word.Application")
strDocPath = "c:\test\test.doc"
strSearch = "word"
Set objDoc = objWord.Documents.Open(strDocPath)
With objDoc.Content.Find
   .Text = strSearch
   .Format = False
   .Wrap = wdFindStop
   Do While .Execute
      iStrCount = iStrCount + 1
   Loop
End With
If iStrCount = 1 Then
WScript.Echo strSearch & " appears once in" & vbCrLf & strDocPath
Else
WScript.Echo strSearch & " appears " & iStrCount & " times in " & vbCrLf & strDocPath
End If
objDoc.Close(wdDoNotSaveChanges)
objWord.Quit
Set objWord = Nothing
Set fso = Nothing

output
Code: [Select]
C:\test>cscript /nologo test.vbs
word appears 4 times in c:\test\test.doc
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: BC_Programmer on October 09, 2009, 09:13:42 PM
countwords.vbs


Code: [Select]
'Word count VBS.

Dim X,Y,countOf,StrWord,docfile


Set X = CreateObject("Word.Application")

docfile="filename.doc"


    For Each y In x.Documents.Add(docfile).Words
        strword = Trim(Replace(y.Text, vbCrLf, ""))
        If StrComp(strword, "and", 1) = 0 Then
            countof = countof + 1
        
        End If
    
    
    Next
    x.Quit
    Set x = Nothing
        
    
    WScript.echo countof

right now just searches one file. as is could be used with a batch file in some fashion.


EDIT: darn it, beaten to the punch. Oh well, mine uses a different method. (I think it's slower though)
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: gh0std0g74 on October 09, 2009, 09:29:30 PM
right now just searches one file. as is could be used with a batch file in some fashion.

here's a full version, searches recursively from current directory. No need batch ... :)
Code: [Select]
Set objFS=CreateObject("Scripting.FileSystemObject")
strFolder="c:\test"
Set objFolder = objFS.GetFolder(strFolder)
Go (objFolder)
Sub Go(objDIR)
  If objDIR <> "\System Volume Information" Then
    For Each eFolder in objDIR.SubFolders
      Go eFolder
    Next
For Each strFile In objDIR.Files
If objFS.GetExtensionName(strFile) = "doc" Then
Set x = CreateObject("Word.Application")
docfile = strFile.Path
countof = 0
    For Each y In x.Documents.Add(docfile).Words
        strword = Trim(Replace(y.Text, vbCrLf, ""))
        If StrComp(strword, "Word", 1) = 0 Then
            countof = countof + 1       
        End If
    Next
    x.Quit
    Set x = Nothing         
    WScript.echo countof , docfile
End If
Next
  End If
End Sub

Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 09:51:31 PM
VBS will replace all Batch.  Too bad I'm not bright enough to use it.

Such Great solutions.  VBS will also replace "C" , "C++"  and all of Unix and shell programming.

The Computer Hope Crew is on the leading Edge.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 09, 2009, 10:09:15 PM
Great job by the Computer Hope Crew.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: gh0std0g74 on October 09, 2009, 11:19:25 PM

C:\>Where is the OutPut?  I can get No Output from BC or Ghost.

the error its pretty obvious. you have a corrupt word file. use the attachment i given

Quote
How do we know the VBS  code countwords works?
isn't that trivial. Create a new word document. Type in sample text with words you want to search. Save as a word doc file. Execute the script and if the count is correct, that means it works.

my output
Code: [Select]
C:\test>cscript /nologo test.vbs
0 C:\test\test\test2.doc
5 C:\test\test.doc
0 C:\test\test1.doc



[Saving space, attachment deleted by admin]
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: BC_Programmer on October 10, 2009, 01:58:34 PM
Dude, if you are going to provide him your solutions in full , go ahead. Depending on my mood, I am not obliged to solve everything for OP.

If i really want to nitpick about your code, i would do that, but i will not. for example, Input argument checking, input file format ambiguity, therefore, would you just give it a rest...



To sum it up.

Billrich. you are not a programmer. That in and of itself is not a problem- I am not a programmer by profession either- but I have actually read "real" programming books, including the language reference for a number of programming languages as well as their corresponding programmers guides, and I'm sure ghostdog has read a few as well. Not to mention Code Complete(bloody fine book), either. I wrote a complete replacement for the very same FileSystemObjects we are using here, that adds far too many features too count. Shall I present the code for that so you may point out if I should be calling CloseHandle() somewhere I forgot? Or will you actually understand that the class based implementation will call CloseHandle() in the Terminate Event. Or more precisely, do you understand events? Probably not.... but you know what? That's fine- Your Learning VBScript. This is good. It's good to ask questions about code. But to make declarations that the program doesn't work when the error message is quite plain is nothing short of plain ol' silly.

Was it a text file? come on, admit it. it was a text file you renamed to doc. it's ok. Have some pie. Not all of it, I want some.

I'm tired of your pedantic arguments against other peoples VBScript or Batch with some frivolous claims as "it does not work" because you can't figure out the commandline syntax, or because you decide that, despite any input format the OP defines, you'll call shenanigans because the batch or VBS script breaks when you insert characters, or because the code isn't "forward thinking" in your eyes, or some other futile attempt at whatever maligned argument you can muster, and if you can't think of any, you just make up some kind of analogy involving farm animals.


I might also add that the scripts I put up here I just kinda whip up as fast as possible. But I always test it. If not posting a "sample output" inconveniences you that's a *censored* shame.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 11, 2009, 07:44:36 AM
To All:

Good Job.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: BC_Programmer on October 11, 2009, 09:10:26 AM
You have no idea what your talking about and therefore I suggest you STFU.

Quote
To not show the output of your code is a red flag that your code does not work.

No. It isn't. for a word document I tested, the output was 4. Was that somehow proof that it does work? No.

However people who try to edit the scripts given and then complain when the EDITED script doesn't work, and when it does work they complain about the output in some pedantic fashion... that is a red flag that they have no idea what they are talking about.


In fact the more I think about it the more I think you may have taken lessons from spectateswamp.

Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: Salmon Trout on October 11, 2009, 10:09:26 AM
BC_P, having crossed swords with Billrich myself, I can only sympathetically suggest you do what I do, which is basically ignore the bugger. I won't bother trying to correct his "code" any longer. If his half baked outpourings are all that the original poster ends up seeing, well, sadly, so be it. Life's too short.

"BC Hat" indeed!  :)

Is that near Medicine Hat? (No I checked, it's in AB)





Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 11, 2009, 11:18:38 AM
Way to go Computer Hope crew.
Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: BC_Programmer on October 11, 2009, 12:22:32 PM

Point to one post in this thread that helped Petreli, the original poster or anyone who might have read this Thread .
"Bat or VBS file to count a specific word in multiple word docs?"

The Fishman, Mr. Trout, does not have answer?

Ghostdogs Script counts a specific word in multiple word docs. For some reason I believe that qualifies.

here's a full version, searches recursively from current directory. No need batch ... :)
Code: [Select]
Set objFS=CreateObject("Scripting.FileSystemObject")
strFolder="c:\test"
Set objFolder = objFS.GetFolder(strFolder)
Go (objFolder)
Sub Go(objDIR)
  If objDIR <> "\System Volume Information" Then
    For Each eFolder in objDIR.SubFolders
      Go eFolder
    Next
For Each strFile In objDIR.Files
If objFS.GetExtensionName(strFile) = "doc" Then
Set x = CreateObject("Word.Application")
docfile = strFile.Path
countof = 0
    For Each y In x.Documents.Add(docfile).Words
        strword = Trim(Replace(y.Text, vbCrLf, ""))
        If StrComp(strword, "Word", 1) = 0 Then
            countof = countof + 1       
        End If
    Next
    x.Quit
    Set x = Nothing         
    WScript.echo countof , docfile
End If
Next
  End If
End Sub

Title: Re: Bat or VBS file to count a specific word in multiple word docs
Post by: billrich on October 11, 2009, 01:25:30 PM
Good Job; Great help by all.