Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: How to edit a CSV with 2 million rows??  (Read 74439 times)

0 Members and 1 Guest are viewing this topic.

bennygill

    Topic Starter


    Newbie

    • Experience: Familiar
    • OS: Windows 7
    How to edit a CSV with 2 million rows??
    « on: April 09, 2013, 11:07:08 PM »
    Hey guys,

    Really need some quick help with this please.

    I have a comma delimited CSV file that holds about 2 millions rows of data (a lot I know but it's the only format I can work with unfortunately). One of the rows has some numbers in it that have lots of decimal places (like 3.908651901). I need to round that entire column - all 2 million rows - to integers.

    The problem is that when I open it in excel it says something like "only 1 million rows shown". After more investigation I have found out that excel cannot handle more than 1 millions rows at a time.

    My question is this: If I select the whole column and round the numbers to integers will that apply to the entire 2 millions rows? Or only the 1 million showing in excel?

    Are there any other programs available that can handle 2-2.5 million rows and change the column to integers?

    Geek-9pm


      Mastermind
    • Geek After Dark
    • Thanked: 1026
      • Gekk9pm bnlog
    • Certifications: List
    • Computer: Specs
    • Experience: Expert
    • OS: Windows 10
    Re: How to edit a CSV with 2 million rows??
    « Reply #1 on: April 09, 2013, 11:33:48 PM »
    How did you get a program to handle that large a data set?
    How many columns?
    Did you already try to break the data into a set of two files?
    Is there a dependency?  Meaning, is there data at the top that would alter data at the bottom?

    EDIT: What! You said two million? Are you sure?

    NZF



      Newbie

      • Experience: Familiar
      • OS: Windows 7
      Re: How to edit a CSV with 2 million rows??
      « Reply #2 on: April 10, 2013, 01:17:22 AM »
      Hi,
      I had a similar problem, (due to collecting data at a high sample rate for a long time). You can open the file in Notepad then split it into smaller sequential files to open in Excel by deleting half of the rows then "Save as" do that twice or three times and Excel will open them for editing. Later I wrote a simple program in Pascal (no one uses that language any more) which did the job on the whole files and just saved the interesting bits.

      The actions in excel will only work on the rows loaded, it can't load 2 million rows. You can round the number sto as many sig places as you lie in excel e.g. round(A1,3) rounds to three decimals

      Salmon Trout

      • Guest
      Re: How to edit a CSV with 2 million rows??
      « Reply #3 on: April 10, 2013, 02:01:55 AM »
      Max rows 65,536 (Excel 2003) 1,048,576 (Excel 2010) but you can import a large dataset into MS Access as long as the database file size is less than 2 GB

      DaveLembke



        Sage
      • Thanked: 662
      • Certifications: List
      • Computer: Specs
      • Experience: Expert
      • OS: Windows 10
      Re: How to edit a CSV with 2 million rows??
      « Reply #4 on: April 12, 2013, 06:17:38 PM »
      Quote
      Max rows 65,536 (Excel 2003) 1,048,576 (Excel 2010)

      Thanks for sharing this Salmon, as for I thought 65,536 was still the limit for rows. I dont have 2010, I moved on to OpenOffice a while back. I hit the limit a few times with a datalogger program passing text data values to CSV file. I was able to correct for this by having it write to a new CSV file when a counter reached 65,000 data samples. Most samples were less than 65k, but if you forget to stop the sampling, the file would continue to grow to a length that was beyond what Excel 2003 supported at the time.

      Regarding rounding all to integers, with that many, you have to either break it up into manageable chunks (or)  its probably best to have a program read in all this data from CSV file to an array and then increment through the array to cast the numeric data into integers which are then written to a new comma delimited file that has been converted to integers.

      *However I like the database suggestion better myself that Salmon Trout suggested. Way better at handling this much data and processing it any way that you need to. Far more flexible than a 1 time conversion to integers that a program would be designed to do, as well as many many ways to assemble the data if needed for reports and graphs etc.

      patio

      • Moderator


      • Genius
      • Maud' Dib
      • Thanked: 1769
        • Yes
      • Experience: Beginner
      • OS: Windows 7
      Re: How to edit a CSV with 2 million rows??
      « Reply #5 on: April 12, 2013, 07:36:51 PM »
      A csv. file won't even hold 2 million rows...i think he's exaggerating.
      " Anyone who goes to a psychiatrist should have his head examined. "

      Salmon Trout

      • Guest
      Re: How to edit a CSV with 2 million rows??
      « Reply #6 on: April 13, 2013, 12:52:24 AM »
      A csv file won't even hold 2 million rows

      A csv file is simply a plain text file and can be as big as the file system and available disk space allow. Limitations may be imposed by the software with which a user chooses to process or display a file.

      I believe after a default install of Microsoft Office the .csv file association is grabbed for Excel. Thereafter csv files show up in Windows Explorer with an Excel icon, and the file type is shown as "Microsoft Office Excel Comma Separated Values File (.csv)", and double-clicking one will make Excel (attempt to) open it. I guess this may lead people to suppose that the csv file format is something to do with Microsoft or Excel and that Excel limitations are native to the csv format.

      The csv format considerably pre-dates the PC era, they were used by mainframes in the 1960s. There are lots and lots and lots of ways of dealing with csv formatted data apart from MS Excel or similar applications.


      « Last Edit: April 13, 2013, 01:16:16 AM by Salmon Trout »

      Salmon Trout

      • Guest
      Re: How to edit a CSV with 2 million rows??
      « Reply #7 on: April 13, 2013, 02:09:27 AM »
      The excellent freeware program Csved can handle large files: http://csved.sjfrancke.nl/




      Salmon Trout

      • Guest
      Re: How to edit a CSV with 2 million rows??
      « Reply #8 on: April 13, 2013, 03:52:21 AM »
      I have just been having some fun; I used a batch to create a csv with dummy values with 100,000 rows, then I copied it 32 times in Windows Explorer (i.e. 5 drag-and-copy here actions) then I used Csved to merge them all into a file having 3,200,000 rows and then used Csved to play around with it. It is surprisingly quick and you can do a surprising number of editing tasks. Thoroughly recommended.

      patio

      • Moderator


      • Genius
      • Maud' Dib
      • Thanked: 1769
        • Yes
      • Experience: Beginner
      • OS: Windows 7
      Re: How to edit a CSV with 2 million rows??
      « Reply #9 on: April 13, 2013, 07:32:36 AM »
       ;D      8)
      " Anyone who goes to a psychiatrist should have his head examined. "

      Salmon Trout

      • Guest
      Re: How to edit a CSV with 2 million rows??
      « Reply #10 on: April 13, 2013, 08:00:25 AM »
      I have read one report that one person had a problem with Csved with a file of 4 million rows, each with 117 comma separated values. However, it was not clear that the application itself was the reason.

      patio

      • Moderator


      • Genius
      • Maud' Dib
      • Thanked: 1769
        • Yes
      • Experience: Beginner
      • OS: Windows 7
      Re: How to edit a CSV with 2 million rows??
      « Reply #11 on: April 13, 2013, 10:21:54 AM »
      I'm impressed you decided to research this ...well done !
      The 65,536 # you mention was the limit i was referring to ...thanx for the correction.
      " Anyone who goes to a psychiatrist should have his head examined. "

      Salmon Trout

      • Guest
      Re: How to edit a CSV with 2 million rows??
      « Reply #12 on: April 13, 2013, 01:57:18 PM »
      I moved on to OpenOffice a while back

      OpenOffice.org 1.1.x versions could only handle up to 32,000 rows, and 2.x and 3.x versions prior to 3.3 could manage 65,536, and 3.3 can handle 1,048,576.

      DaveLembke



        Sage
      • Thanked: 662
      • Certifications: List
      • Computer: Specs
      • Experience: Expert
      • OS: Windows 10
      Re: How to edit a CSV with 2 million rows??
      « Reply #13 on: April 16, 2013, 06:55:44 PM »
      Quote
      OpenOffice.org 1.1.x versions could only handle up to 32,000 rows, and 2.x and 3.x versions prior to 3.3 could manage 65,536, and 3.3 can handle 1,048,576.

      Thanks for this info Salmon .... its been a long, long time since I dealt with anything greater than 10k rows. That data logger was the only application that was hitting the old Office 2003 Excel limit which I capped to 65k and then start a new log file.

      Going to check into Csved as for I remember it being very laggy in Excel with 65k rows and wanting to go to row 45,326 for example to analyze a value at a timed interval and other neighboring intervals to that sample etc. Back when I worked on Loadcells, Strain Gages, Extensometers, and very precise transducers. Here is link to a place I worked at that I had to work with lots of data from data loggers in huge excel files. http://www.geokon.com/

      camerongray



        Expert
      • Thanked: 306
        • Yes
        • Cameron Gray - The Random Rambings of a Computer Geek
      • Certifications: List
      • Computer: Specs
      • Experience: Expert
      • OS: Mac OS
      Re: How to edit a CSV with 2 million rows??
      « Reply #14 on: April 17, 2013, 03:05:20 PM »
      Depending on your experience with any form of programming or scripting it should be fairly simple to write a script to process the file and do your rounding operations.  A language like Perl, PHP or Python would be a good candidate for this.  From experience I've found that simple scripts tend not to struggle when there are large numbers of rows, they just take a little time to sit and work whereas desktop applications like Excel tend to struggle and slow to a crawl when the dataset is large.