Computer Hope

Software => Computer software => Topic started by: jonnyc on November 01, 2011, 09:56:12 AM

Title: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: jonnyc on November 01, 2011, 09:56:12 AM
I'm looking for all the help I can get on this.

I have a racing program that runs off of single file data.

I am wondering if I can turn past performance PDF files into these comma delimited files ??

Here is a PDF (click on Free Sample Ultimate PP's)- http://www.brisnet.com/cgi-bin/HTML/samples.html#uwb

Here is a data file (click on Free Sample PP Data Files (single)- http://www.brisnet.com/cgi-bin/HTML/samples.html#drs

Thanks for any and all help on this. I am trying to get something done by Thursday night.

John
Title: Re: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: soybean on November 01, 2011, 10:25:03 AM

Here is a PDF (click on Free Sample Ultimate PP's)- http://www.brisnet.com/cgi-bin/HTML/samples.html#uwb

Here is a data file (click on Free Sample PP Data Files (single)- http://www.brisnet.com/cgi-bin/HTML/samples.html#drs

Are the above alternative sources of the data, or what?
Title: Re: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: jonnyc on November 01, 2011, 12:58:14 PM
Yes. I would like to turn that PDF into the Data File.

Thanks
Title: Re: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: soybean on November 01, 2011, 04:14:45 PM
Well, I think it will be a laborious process but you could copy the text from the PDF to a text file and then import it into Excel.  I think it will be a space-delimited file, not comma-delimited.  At least, that's what I see based a 3 lines I copied from one of the PDFs.  Here are the 3 lines:

12 - 2 1 9 Dec 02 ECHUC 3yo Mdn Plte 5 5/8 f GD 118- 6 Nicholas R
11 - 11 5 30 Nov 02 M.V. 3yo C&G Hcp 5 f YD 114- 6 M Pumpa
7 - 3 4 19 Nov 02 BALL'T 3yo+ Mdn Plte 5 f GD 123- 5 A Findlay

Note that this will not be a smooth process due to differences in the 3 rows.  For example, the 1st row has "5 5/8" where the 2nd and 3rd rows have just 5.  The 3rd row has a + afater 3yo, while the first 2 rows do not.  Such differences will complicate the process of importing and parsing the data into columns.  But, I may be mis-interpreting what you want to accomplish here.
Title: Re: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: Geek-9pm on November 01, 2011, 04:58:49 PM
Must it be a DOS batch file?
How much work can be done by hand?
How much data must be processed?
There are tools that will let you process the data in Exell as a spreadsheet. You can use VBA to edit some material. But it is not batch and esquires some user interaction.
Here is part pf a huge text file I created in a few keystrokes. I think it can be parsed by VBA in Excel. Or something like that.

EDIT: To view the attacked file, use notepad and set the font to Courier 8 with no word wrap.


[regaining space - attachment deleted by admin]
Title: Re: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: soybean on November 01, 2011, 06:07:21 PM
There are tools that will let you process the data in Exell as a spreadsheet. You can use VBA to edit some material. But it is not batch and esquires some user interaction.
Here is part pf a huge text file I created in a few keystrokes. I think it can be parsed by VBA in Excel. Or something like that.
I think the test file you posted would not work well.   The reason is what I said in my previous post.  Parsing a data file in Excel works best when all rows have exactly the same data elements, meaning they would all yield the same number of columns when parsed.  As I said in my previous post, and as you can see in your test file, there are differences between rows.  You also have header rows which complicates the procedure.
Title: Re: Trying to turn PDF file into Comma-Delimited file (single file format)
Post by: Geek-9pm on November 01, 2011, 06:21:51 PM
Right. You can convert it to a nice text file using Foxit Reader. But not all PDF files will allow it. I only tested it on one file.
http://foxitsoftware.com/downloads/

Not being up to speed with Python, I would use some dialect of MS basic to parse the lines and put then into some form that is more digestible for Excel.

Of course, you can parse the text with a batch file, it you like doing that kind of thing. Maybe PowerShell would be a better choice.

I think there has to be some user interaction to handle all the exceptions you will get. Otherwise it will be a script with a long list of 'what if' stuff.