Computer Hope

Microsoft => Microsoft DOS => Topic started by: carmine on December 04, 2014, 11:01:47 PM

Title: Update CSV files
Post by: carmine on December 04, 2014, 11:01:47 PM
Hi,

There are 30 CSV files in folder and i want to update all of them, the format of files are same.

Matching rows by comparing Col A, B, and C. If match(s) found, the duplicate rows to be deleted and their contents are appended to the original.

.csv file.

A|B|C|D|E|F|G|H|.........
x|y|z|1|2|3|4|5|
x|y|z|7|8|9|1|2|
o|p|q|9|6|3|2|1|
x|y|z|3|4|7|8|1|

modified.csv
A|B|C|D|E|F|G|H|.........
x|y|z|1|2|3|4|5|7|8|9|1|2|3|4|7|8|1|
o|p|q|9|6|3|2|1|

Any ways.





Title: Re: Update CSV files
Post by: Salmon Trout on December 05, 2014, 04:43:52 AM
That is not a csv file format. The 'c' is 'csv' stands for 'comma'. Your column separator is a pipe character '|'. So please clarify.

However...

1. Must it be done using batch? Can VBScript be used? If not, why not?
2. Can temp files be written somewhere?
3. Does each column contain just 1 character? If not, please supply more representative example data.

Title: Re: Update CSV files
Post by: carmine on December 05, 2014, 06:07:51 AM
That is not a csv file format. The 'c' is 'csv' stands for 'comma'. Your column separator is a pipe character '|'. So please clarify.

However...

1. Must it be done using batch? Can VBScript be used? If not, why not?
2. Can temp files be written somewhere?
3. Does each column contain just 1 character? If not, please supply more representative example data.

Thanks Salmon for reply.
I agree its Pipe-Delimiter file exported from excel and saved as .csv
1. Approx. there are 20000 to 25000 rows and 200 columns. I don't mine if its more faster/flexible to achieve with VBScript, because i have atleast 30 files in which continuously new data is appended and once in month i will have to update/re-arrange it.
2. Yes we can create the temp files. c:\Temp.
3. No, total data is numeric not a single character. Col1=2 numeric, Col2=4 numeric, Col3=1 to 7 numericals and from Col4 fixed 6 numeric.
Sample data:
ST|BRIN|TP|KEY1|KEY2|KEY3|KEY4|KEY5|KEY6|KEY7|KEY8|KEY9|KEY10|KEY11|KEY12|KEY13|KEY14|KEY15|KEY16|KEY17|KEY18|KEY19|KEY20|KEY21|KEY22|KEY23|KEY24|KEY25|KEY26|KEY27|KEY28|KEY29|KEY30|KEY31|KEY32|KEY33|KEY34|KEY35|KEY36|KEY37|KEY38|KEY39|KEY40|KEY41|KEY42|KEY43|KEY44|KEY45|KEY46|KEY47|KEY48|KEY49|KEY50|KEY51|KEY52|KEY53|KEY54|KEY55|KEY56|KEY57|KEY58|KEY59|KEY60|
53|5305|1|722101|722102|722103|722104|722105|722106|722107|722108|722109|722110|722111|722112|722113|722114|722115|722116|722117|722118|722119|722120|722121|722122|722123|722124|722125|722126|722127|722128|722129|722130|722131|722132|722133|722134|722135|722136|722137|722138|722139|722140|722141|722142|722143|722144|722145|722146|722147|722148|722149|722150|
53|5305|1|745551|745552|745553|745554|745555|745556|745557|745558|745559|745560|745561|745562|745563|745564|745565|745566|745567|745568|745569|745570|745571|745572|745573|745574|745575|745576|745577|745578|745579|745580|745581|745582|745583|745584|745585|745586|745587|745588|745589|745590|745591|745592|745593|745594|745595|745596|745597|745598|745599|745600|
53|5305|1|823126|813002|813003|813004|813005|813006|813007|813008|813009|813010|813011|813012|813013|813014|813015|813016|813017|813018|813019|813020|813021|813022|813023|813024|813025|813026|813027|813028|813029|813030|813031|813032|813033|813034|813035|813036|813037|813038|813039|813040|813041|813042|813043|813044|813045|813046|813047|813048|813049|813050|
37|2037|22|575196|514202|514203|514204|514205|514206|514207|514208|514209|514210|514211|514212|514213|514214|514215|514216|514217|514218|514219|514220|514221|514222|514223|514224|514225|514226|514227|514228|514229|514230|514231|514232|514233|514234|514235|514236|514237|514238|514239|514240|514241|514242|514243|514244|514245|514246|514247|514248|514249|514250|
37|2037|22|713951|713952|713953|713954|713955|713956|713957|713958|713959|713960|713961|713962|713963|713964|713965|713966|713967|713968|713969|713970|713971|713972|713973|713974|713975|713976|713977|713978|713979|713980|713981|713982|713983|713984|713985|713986|713987|713988|713989|713990|713991|713992|713993|713994|713995|713996|713997|713998|713999|714000|
Title: Re: Update CSV files
Post by: dbenham on December 05, 2014, 09:57:05 AM
The basic algorithm is fairly simple.

1) sort the input descending so header remains at top.
2) initialize PreviousKey to something that will never match
3) for each line: (note, writes are without \n unless explicitly stated)
     - if 1st line than write line
     - else if key = previousKey write nonKey data
     - else write \n + entire line
4) terminate the last line with \n

This can be done easily with VBScript or JScript or PowerShell.

Or you could use batch coupled with JREPL.BAT (http://www.dostips.com/forum/viewtopic.php?f=3&t=6044)
Code: [Select]
@echo off
sort /r %1 /o %1
call jrepl "^((?:.*?\|){3})(.*)" "output.Write($1==prev?$2:(prev?'\r\n':'')+$0);prev=$1;false;" /jmatch /jbeg "var prev" /f %1 /o -
echo(>>%1

I suppose it could be done with pure batch, but I would worry about performance, as well as the potential for lines to exceed the max batch string length of ~8191 bytes.
Title: Re: Update CSV files
Post by: Salmon Trout on December 05, 2014, 10:26:24 AM
Thanks, dbenham! I was inclining towards VBScript, because of the Dictionary Object but you have ridden to the rescue!