Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: delete duplicates in text file  (Read 3383 times)

0 Members and 1 Guest are viewing this topic.

Paul Hen

  • Guest
delete duplicates in text file
« on: May 03, 2005, 02:29:32 PM »
I have a text file with hundreds of e-mail addresses in it (separated by semi-colons) - no, NOT for spamming! I want to get rid of duplicate e-mail addresses within the file so if I broadcast a message to those on my list, I won't annoy anybody by sending it two or three times. Any suggestions would be welcomed, as I have tried everything including a script from MS Technet which I could not get to run.

Flame

  • Moderator


  • Prodigy

  • Think, dream, see, be... Everything.
  • Thanked: 6
    • Yes
  • Certifications: List
  • Experience: Guru
  • OS: Other
Re: delete duplicates in text file
« Reply #1 on: May 03, 2005, 03:26:14 PM »
Find and replace? (will get old after a while)

[glb]Flame[/glb]

Paul Hen

  • Guest
Re: delete duplicates in text file
« Reply #2 on: May 03, 2005, 08:21:52 PM »
Thanks for the reply Flame.
Unfortunately, I am an old guy and likely won't be around long enough to do a search & replace manually.
Any other, more automated approaches, are most welcome!

Sidewinder



    Guru

    Thanked: 139
  • Experience: Familiar
  • OS: Windows 10
Re: delete duplicates in text file
« Reply #3 on: May 04, 2005, 07:22:49 AM »
Automation coming up:

Code: [Select]

Const ForReading = 1
Const ForWriting = 2
Dim arrList()

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile = objFSO.OpenTextFile("PathToInputFile", ForReading)
Set objOutFile = objFSO.OpenTextFile("PathToOutputFile", ForWriting, True)

' Extract each email address from line
'
Do Until objInFile.AtEndOfStream
 strName = objInFile.ReadLine
 arrName = Split(strName, ";")
 For i = 0 to UBound(arrName) - 1
       j=j+1
       ReDim Preserve arrList(j)
       arrList(j) = arrName(i)
 Next
Loop

' Eliminate the dups
'
For i = 1 To UBound(arrList)
     For j = i + 1 To UBound(arrList)
           If arrList(i) = arrList(j) Then
                 arrList(j) = ""
           End If
     Next
Next

' Write out what's left in array
'
For i = 1 to UBound(arrList)
     If arrList(i) <> "" Then
            objOutFile.Writeline arrList(i)
      End If
Next
           
objInFile.Close
objOutFile.Close
WScript.Quit


Change the PathToInputFile and PathToOutputFile to your own needs. From the directory you saved the script run it as: cscript scriptname.vbs

Please don't post the same question to different boards. It gets the natives very confused.  ;D

Hope this helps.

8)
« Last Edit: May 04, 2005, 07:59:11 AM by Sidewinder »
The true sign of intelligence is not knowledge but imagination.

-- Albert Einstein

Paul Hen

  • Guest
Re: delete duplicates in text file
« Reply #4 on: May 04, 2005, 01:48:49 PM »
Sidewinder,

Thanks ever so much for your kind assistance. I will try this out and report back. Sorry about the cross post but I wasn't sure whether there was a DOS fix or Windows fix - I would take either!!

In any case, may the gods smile on you and the others that were so helpful as to reply.

I am always neck deep in computer woes, so I will certainly be a regular here now that I have found this site - and will lend a hand when I think my input will be of some value. Nothing like blue screens to make your knowledge grow!!

Paul Hen

  • Guest
Re: delete duplicates in text file
« Reply #5 on: May 05, 2005, 09:01:45 AM »
Sidewinder - please help!

I am so sorry to bother you yet again, but I have tried confoguring this script many times and can't get it to run past the line where I am to insert the path & filename. Do you think you could nursemaid me by inserting the following two paths for the input and output files resepectively where they belong and in the appropriate format so I can just copy & paste complete and run it?

c:\newsletter-duplicates.txt
c:\newsletterNOduplicates.txt

I am blushing as I write this, but I have never had anything to do with macros or scripts and am not familair at all with the syntax.

gussery

  • Guest
Re: delete duplicates in text file
« Reply #6 on: May 05, 2005, 09:08:26 AM »
I know I am not Sidewinder but this works for me.....
Code: [Select]
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile = objFSO.OpenTextFile("c:\newsletter-duplicates.txt", ForReading)
Set objOutFile = objFSO.OpenTextFile("c:\newsletterNOduplicates.txt", ForWriting, True)

' Extract each email address from line
'
Do Until objInFile.AtEndOfStream
 strName = objInFile.ReadLine
 arrName = Split(strName, ";")
 For i = 0 to UBound(arrName) - 1
  j=j+1
  ReDim Preserve arrList(j)
  arrList(j) = arrName(i)
 Next
Loop

' Eliminate the dups
'
For i = 1 To UBound(arrList)
For j = i + 1 To UBound(arrList)
 If arrList(i) = arrList(j) Then
  arrList(j) = ""
 End If
Next
Next

' Write out what's left in array
'
For i = 1 to UBound(arrList)
If arrList(i) <> "" Then
  objOutFile.Writeline arrList(i)
 End If
Next
 
objInFile.Close
objOutFile.Close
WScript.Quit


Both files have to exist.  The second file should be blank.


Paul Hen

  • Guest
Re: delete duplicates in text file
« Reply #7 on: May 05, 2005, 09:17:55 AM »
Thanks for the help Gussery,

I copeied and pasted it directly and ran it having both text files in the appropriate director - one empty of course.
It errors out with the following

Line 2 Char 1
Invalid Procedure Call or Argument

Sidewinder



    Guru

    Thanked: 139
  • Experience: Familiar
  • OS: Windows 10
Re: delete duplicates in text file
« Reply #8 on: May 05, 2005, 09:34:57 AM »
Paul,

You might have missed a few lines during the cut & paste.

Code: [Select]

Const ForReading = 1
Const ForWriting = 2
Dim arrList()


Set objFSO = CreateObject("Scripting.FileSystemObject")  
Set objInFile = objFSO.OpenTextFile("c:\newsletter-duplicates.txt", ForReading)  
Set objOutFile = objFSO.OpenTextFile("c:\newsletterNOduplicates.txt", ForWriting, True)  

' Extract each email address from line  
'  
Do Until objInFile.AtEndOfStream  
 strName = objInFile.ReadLine  
 arrName = Split(strName, ";")  
 For i = 0 to UBound(arrName) - 1  
  j=j+1  
  ReDim Preserve arrList(j)  
  arrList(j) = arrName(i)  
 Next  
Loop  

' Eliminate the dups  
'  
For i = 1 To UBound(arrList)  
For j = i + 1 To UBound(arrList)  
 If arrList(i) = arrList(j) Then  
  arrList(j) = ""  
 End If  
Next  
Next  

' Write out what's left in array  
'  
For i = 1 to UBound(arrList)  
If arrList(i) <> "" Then  
  objOutFile.Writeline arrList(i)  
 End If  
Next  
 
objInFile.Close  
objOutFile.Close
WScript.Quit


Also the output file should not exist prior to running the script.

Good luck.  8)
« Last Edit: May 05, 2005, 09:40:05 AM by Sidewinder »
The true sign of intelligence is not knowledge but imagination.

-- Albert Einstein

Paul Hen

  • Guest
Re: delete duplicates in text file
« Reply #9 on: May 05, 2005, 11:53:05 AM »
Greetings!

So close . . . The script to generate the list of e-mails without dupes succeeded. What remains is to reinsert the ; separator and to make the list a continuous line as the original was (the new list generated has an e-mail address on an individual line). My concern with the latter is pasing it into outlook may be difficult with all the line breaks. But the semi-colon separator has to be there of course.

I could try the machinations through word etc. but since I will have to repeat this exercise from time to time, it sure would be nice to get the script to do the whole job.

Thanks ever so much!

Sidewinder



    Guru

    Thanked: 139
  • Experience: Familiar
  • OS: Windows 10
Re: delete duplicates in text file
« Reply #10 on: May 05, 2005, 04:27:56 PM »
Here you go:

Code: [Select]

Const ForReading = 1
Const ForWriting = 2
Dim arrList()

Set objFSO = CreateObject("Scripting.FileSystemObject")  
Set objInFile = objFSO.OpenTextFile("c:\newsletter-duplicates.txt", ForReading)  
Set objOutFile = objFSO.OpenTextFile("c:\newsletterNOduplicates.txt", ForWriting, True)  

' Extract each email address from line  
'  
Do Until objInFile.AtEndOfStream  
 strName = objInFile.ReadLine  
 arrName = Split(strName, ";")  
 For i = 0 to UBound(arrName) - 1  
  j=j+1  
  ReDim Preserve arrList(j)  
  arrList(j) = arrName(i) & ";"  
 Next  
Loop  

' Eliminate the dups  
'  
For i = 1 To UBound(arrList)  
For j = i + 1 To UBound(arrList)  
 If arrList(i) = arrList(j) Then  
  arrList(j) = ""  
 End If  
Next  
Next  

' Write out what's left in array as one line of output  
'  
For i = 1 to UBound(arrList)  
 If arrList(i) <> "" Then
  OneLine = OneLine & arrList(i)  
 End If  
Next  
objOutFile.Writeline OneLine  
 
objInFile.Close  
objOutFile.Close
WScript.Quit


Just curious, but why not create a group in your email program and then add the names to the group?

Good luck.  8)
The true sign of intelligence is not knowledge but imagination.

-- Albert Einstein

Paul Hen

  • Guest
Re: delete duplicates in text file
« Reply #11 on: May 05, 2005, 05:18:56 PM »
Worked like a charm - thank you so much. I am VERY impressed with the expertise and such detailed and valuable assistance that is provided here by you all - and for free at that. May the gods be kind to you all! And please be patient as I have several more issues to pose that I would value your thoughts on in respect to other topics.

I can see that knowing one's way around macros and scripts can be a big advantage - especially with repeating tasks. But they seem complex enough that it would take a good deal of time to develop the level of expertise that you and Gussery have.

And I will be adding them to a group if I can, but my Outlook address book ceased to function some time ago and I've not had time to investigate it. Even updating Office to 2003 did not remedy it.