Computer Hope

Software => Computer software => Topic started by: bennygill on April 09, 2013, 11:07:08 PM

Title: How to edit a CSV with 2 million rows??
Post by: bennygill on April 09, 2013, 11:07:08 PM
Hey guys,

Really need some quick help with this please.

I have a comma delimited CSV file that holds about 2 millions rows of data (a lot I know but it's the only format I can work with unfortunately). One of the rows has some numbers in it that have lots of decimal places (like 3.908651901). I need to round that entire column - all 2 million rows - to integers.

The problem is that when I open it in excel it says something like "only 1 million rows shown". After more investigation I have found out that excel cannot handle more than 1 millions rows at a time.

My question is this: If I select the whole column and round the numbers to integers will that apply to the entire 2 millions rows? Or only the 1 million showing in excel?

Are there any other programs available that can handle 2-2.5 million rows and change the column to integers?
Title: Re: How to edit a CSV with 2 million rows??
Post by: Geek-9pm on April 09, 2013, 11:33:48 PM
How did you get a program to handle that large a data set?
How many columns?
Did you already try to break the data into a set of two files?
Is there a dependency?  Meaning, is there data at the top that would alter data at the bottom?

EDIT: What! You said two million? Are you sure?
Title: Re: How to edit a CSV with 2 million rows??
Post by: NZF on April 10, 2013, 01:17:22 AM
Hi,
I had a similar problem, (due to collecting data at a high sample rate for a long time). You can open the file in Notepad then split it into smaller sequential files to open in Excel by deleting half of the rows then "Save as" do that twice or three times and Excel will open them for editing. Later I wrote a simple program in Pascal (no one uses that language any more) which did the job on the whole files and just saved the interesting bits.

The actions in excel will only work on the rows loaded, it can't load 2 million rows. You can round the number sto as many sig places as you lie in excel e.g. round(A1,3) rounds to three decimals
Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on April 10, 2013, 02:01:55 AM
Max rows 65,536 (Excel 2003) 1,048,576 (Excel 2010) but you can import a large dataset into MS Access as long as the database file size is less than 2 GB
Title: Re: How to edit a CSV with 2 million rows??
Post by: DaveLembke on April 12, 2013, 06:17:38 PM
Quote
Max rows 65,536 (Excel 2003) 1,048,576 (Excel 2010)

Thanks for sharing this Salmon, as for I thought 65,536 was still the limit for rows. I dont have 2010, I moved on to OpenOffice a while back. I hit the limit a few times with a datalogger program passing text data values to CSV file. I was able to correct for this by having it write to a new CSV file when a counter reached 65,000 data samples. Most samples were less than 65k, but if you forget to stop the sampling, the file would continue to grow to a length that was beyond what Excel 2003 supported at the time.

Regarding rounding all to integers, with that many, you have to either break it up into manageable chunks (or)  its probably best to have a program read in all this data from CSV file to an array and then increment through the array to cast the numeric data into integers which are then written to a new comma delimited file that has been converted to integers.

*However I like the database suggestion better myself that Salmon Trout suggested. Way better at handling this much data and processing it any way that you need to. Far more flexible than a 1 time conversion to integers that a program would be designed to do, as well as many many ways to assemble the data if needed for reports and graphs etc.
Title: Re: How to edit a CSV with 2 million rows??
Post by: patio on April 12, 2013, 07:36:51 PM
A csv. file won't even hold 2 million rows...i think he's exaggerating.
Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on April 13, 2013, 12:52:24 AM
A csv file won't even hold 2 million rows

A csv file is simply a plain text file and can be as big as the file system and available disk space allow. Limitations may be imposed by the software with which a user chooses to process or display a file.

I believe after a default install of Microsoft Office the .csv file association is grabbed for Excel. Thereafter csv files show up in Windows Explorer with an Excel icon, and the file type is shown as "Microsoft Office Excel Comma Separated Values File (.csv)", and double-clicking one will make Excel (attempt to) open it. I guess this may lead people to suppose that the csv file format is something to do with Microsoft or Excel and that Excel limitations are native to the csv format.

The csv format considerably pre-dates the PC era, they were used by mainframes in the 1960s. There are lots and lots and lots of ways of dealing with csv formatted data apart from MS Excel or similar applications.


Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on April 13, 2013, 02:09:27 AM
The excellent freeware program Csved can handle large files: http://csved.sjfrancke.nl/



Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on April 13, 2013, 03:52:21 AM
I have just been having some fun; I used a batch to create a csv with dummy values with 100,000 rows, then I copied it 32 times in Windows Explorer (i.e. 5 drag-and-copy here actions) then I used Csved to merge them all into a file having 3,200,000 rows and then used Csved to play around with it. It is surprisingly quick and you can do a surprising number of editing tasks. Thoroughly recommended.
Title: Re: How to edit a CSV with 2 million rows??
Post by: patio on April 13, 2013, 07:32:36 AM
 ;D      8)
Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on April 13, 2013, 08:00:25 AM
I have read one report that one person had a problem with Csved with a file of 4 million rows, each with 117 comma separated values. However, it was not clear that the application itself was the reason.
Title: Re: How to edit a CSV with 2 million rows??
Post by: patio on April 13, 2013, 10:21:54 AM
I'm impressed you decided to research this ...well done !
The 65,536 # you mention was the limit i was referring to ...thanx for the correction.
Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on April 13, 2013, 01:57:18 PM
I moved on to OpenOffice a while back

OpenOffice.org 1.1.x versions could only handle up to 32,000 rows, and 2.x and 3.x versions prior to 3.3 could manage 65,536, and 3.3 can handle 1,048,576.
Title: Re: How to edit a CSV with 2 million rows??
Post by: DaveLembke on April 16, 2013, 06:55:44 PM
Quote
OpenOffice.org 1.1.x versions could only handle up to 32,000 rows, and 2.x and 3.x versions prior to 3.3 could manage 65,536, and 3.3 can handle 1,048,576.

Thanks for this info Salmon .... its been a long, long time since I dealt with anything greater than 10k rows. That data logger was the only application that was hitting the old Office 2003 Excel limit which I capped to 65k and then start a new log file.

Going to check into Csved as for I remember it being very laggy in Excel with 65k rows and wanting to go to row 45,326 for example to analyze a value at a timed interval and other neighboring intervals to that sample etc. Back when I worked on Loadcells, Strain Gages, Extensometers, and very precise transducers. Here is link to a place I worked at that I had to work with lots of data from data loggers in huge excel files. http://www.geokon.com/ (http://www.geokon.com/)
Title: Re: How to edit a CSV with 2 million rows??
Post by: camerongray on April 17, 2013, 03:05:20 PM
Depending on your experience with any form of programming or scripting it should be fairly simple to write a script to process the file and do your rounding operations.  A language like Perl, PHP or Python would be a good candidate for this.  From experience I've found that simple scripts tend not to struggle when there are large numbers of rows, they just take a little time to sit and work whereas desktop applications like Excel tend to struggle and slow to a crawl when the dataset is large.
Title: Re: How to edit a CSV with 2 million rows??
Post by: Syl1971 on June 10, 2013, 01:55:22 PM
I use a product call Ron's Editor (Google it). Its perfect for this sort of thing, and actually has a number replace function - beleive that? I have to wade through some ugly . . . stuff sometimes and this program just makes my day a little easier.
Title: Re: How to edit a CSV with 2 million rows??
Post by: Salmon Trout on June 10, 2013, 03:44:48 PM
I use a product call Ron's Editor (Google it). Its perfect for this sort of thing

And it costs 25 dollars. I wonder why you rolled in here 2 months after the thread died? Is it because you want to shill a paid for product? I know it has a crippled free version, whereas the excellent CSved is free in its full version.
Title: Re: How to edit a CSV with 2 million rows??
Post by: patio on June 14, 2017, 04:08:05 PM
Topic Closed.