Computer Hope

Microsoft => Microsoft DOS => Topic started by: hhaji on March 29, 2018, 01:40:17 PM

Title: Need help with a simple batch script
Post by: hhaji on March 29, 2018, 01:40:17 PM
Hi,

This is my first post, I am new to this forum.

I would appreciate your help. 

I have a full time job unrelated to programing but could benefit from having a script like below that would help me autmate data extraction.

We have windows 10 operating system.

We have an old system in place that the following automation would cut down on my wasting time extracting data manually from a bunch of pdf files that have txt and pictures combined

I learned that the below script (manually) would do the job but it needs to turn into a batch job.

Please feel free to post your reply on the forum or email me directly.

Rgds,

----------
# need a script.bat or .cdm to do the following

for i = 1 to 100

copy D: "example"+"i".pdf to C: "example"+"i".pdf

# when in C: drive
# do the following 4 subroutines

open example"i".pdf
save as example"i".jpg

open example"i".jpg
print test"i".pdf      # this will allow me to extract numbers from a picture

open test"i".pdf
save as test"i".xls

open test"i".xls
Vlookup x,y,z,w

Rtn (for i=)

-----------
 :)
Title: Re: Need help with a simple batch script
Post by: Squashman on March 29, 2018, 11:10:32 PM
What programming language is that.  It certainly is not a Windows .bat file.
Title: Re: Need help with a simple batch script
Post by: Geek-9pm on March 30, 2018, 12:19:09 AM
Quote
We have an old system in place that the following automation would cut down on my wasting time extracting data manually from a bunch of pdf files that have txt and pictures combined
Explain how you  do it manually.
Your manual method can be used as a model for a script.
First tell us what you need to d.

What do you wish to get out of the PDF documents?



Title: Re: Need help with a simple batch script
Post by: hhaji on March 30, 2018, 07:16:09 AM
Sorry folks for not being clear,

This is not in any particular language.  I just typed in my actions as I would do the process manually, step by step.  I left out bunch of sub-steps for brevity. 

My hope is to get feed back on what these steps would be, legally stated in a typical .bat script, that I can go and run it as is - with some minor edits for the proper folder location etc.  I am assuming that the script can open a pdf file and save it as jpg for example - or am I incorrect on that?

rgds,
Title: Re: Need help with a simple batch script
Post by: Squashman on March 30, 2018, 07:43:27 AM
I am assuming that the script can open a pdf file and save it as jpg for example - or am I incorrect on that?
What program are you using that does that?  Does it have command line capability?
Title: Re: Need help with a simple batch script
Post by: hhaji on March 30, 2018, 01:22:23 PM
That is exactly why I am here at this forum asking for the clear syntax instructions.  What specific text do I need to type into the Notepad with what proper punctuation to save as foo.bat in order to execute the batch job at the C> prompt or the windows command line?

For example if you had to write a script to do the following what would your .bat script look like?

1.   Open foo.pdf
2.   Save as foo.jpg
3.   Open foo.jpg
4.   Print to pdf as FOOO.pdf
5.   Open FOOO.pdf
6.   Export to foo.xls

Mind you my full time job has nothing to do with programing but if I get a partially working example from one you experts I can play with it till I get it to work - with some possible additional help :)
Title: Re: Need help with a simple batch script
Post by: patio on March 30, 2018, 01:25:18 PM
You didn't answer Squashmans question...
Title: Re: Need help with a simple batch script
Post by: hhaji on March 30, 2018, 02:26:25 PM
Adobe Acrobat DC Standard - I can edit pdf file but not sure if it does what you are asking.  Sorry Squashmans.
Title: Re: Need help with a simple batch script
Post by: Geek-9pm on March 30, 2018, 03:08:12 PM
This is a reference:
https://helpx.adobe.com/acrobat/tutorials.html
Fill, edit, sign, and share PDFs on any device.

PDF does a good job of putting words and pictures together. But it is weak on arithmetic and things using numerical calculations.. Like designing bridge.

In general, a PDF is fora visual presentation and is done by hand because it has a flare for things people do by hand. It is a presentation, not a sale report.

Well, you could use it for sales report. But the data would come from some kind of source that does numbers, not pictures. Now it you are doing studies in handwriting, you would use pictures.

So again, What do you want to do? What is it you want to show other people? some numbers? Some pictures? Why is it information in a PDF to start with? Is it your research?  Did it start out as a PDF and not a spreadsheet?

The key here is no programming languages, but rather what kind of work are you doing?  Is this something in medical treatment? Is this for sales tram?  Are you investigating handwritten documents?  Are you building a rocket ship?

Just want to help.  8)
Title: Re: Need help with a simple batch script
Post by: BC_Programmer on March 30, 2018, 07:43:18 PM
1.   Open foo.pdf
2.   Save as foo.jpg
3.   Open foo.jpg
4.   Print to pdf as FOOO.pdf
5.   Open FOOO.pdf
6.   Export to foo.xls

I'm skeptical that this can be done with a Batch script, at least not without third-party software that provides most of the features. Particularly since based on your description previously you want to have the PDF import from jpeg actually attempt to recognize text.
Title: Re: Need help with a simple batch script
Post by: Geek-9pm on March 30, 2018, 07:48:27 PM
I'm skeptical that this can be done with a Batch script, at least not without third-party software that provides most of the features. Particularly since based on your description previously you want to have the PDF import from jpeg actually attempt to recognize text.
That is about what I was thinking too.
Title: Re: Need help with a simple batch script
Post by: nil on March 31, 2018, 05:54:56 AM
It sounds like you have two separate problems: you need to figure out a process to extract data from a PDF, and also you'd like to automate that process -- whatever it is.

It sounds like you need to extract text and images separately from a PDF, is that correct?

It would help if you describe in detail what process you currently use --
Title: Re: Need help with a simple batch script
Post by: DaveLembke on March 31, 2018, 09:54:55 AM
Im  thinking same along the lines of what BC stated, 3rd party software would be needed. I use a keyboard/mouse macro for automated redundant stuff. It basically allows you to go through the process manually the first time while recording your actions. Then you can compile that into EXE form and call that from Batch or any other scripting or programming language as a system call etc. Software I use costs money though so if looking for a free solution I dont have a free solution.

I use Jitbit Macro Recorder for automated stuff. https://www.jitbit.com/macro-recorder/

The cool thing is that you can also go in and alter it before compiling it as a stand alone EXE to speed it up removing time delays while it was recorded when you were slow to be careful not to mess it up etc, as well as it has a linear execution list of what its going to do and you can add a goto loop etc and make it run over and over as well as add or remove stuff to sharpen it up for whatever application you have.

Here is a video of how it operates: https://www.jitbit.com/macro-recorder/macro-recorder-video-tutorial/
Title: Re: Need help with a simple batch script
Post by: Squashman on March 31, 2018, 12:51:41 PM
PDF to JPG to PDF to Excel? That is a big *censored* in my book. Makes no logical sense.
Title: Re: Need help with a simple batch script
Post by: patio on March 31, 2018, 02:04:26 PM
PDF to JPG to PDF to Excel? That is a big *censored* in my book. Makes no logical sense.

It souunds to me that the work staff are giving him impossible tasks...this is for work as per the 1st post.
Why they would do this i have no clue...
Title: Re: Need help with a simple batch script
Post by: Geek-9pm on March 31, 2018, 04:21:47 PM
It souunds to me that the work staff are giving him impossible tasks...this is for work as per the 1st post.
Why they would do this i have no clue...
Let me guess.
A Government Agency?   :P 
Title: Re: Need help with a simple batch script
Post by: hhaji on April 02, 2018, 08:34:09 AM
Thanks everyone for trying to help me solve this problem.  The data was spit out of an older version of measuring automated tool (CMM) in the form of picture and text PDF file.  Since then they are making the data available in excel which is manageable.  My task is to extract numeric data from these old files - from the picture part of the pdf.   The above strategy helps me do that manually through the sequence of conversions.  I am looking for a way to automate the steps mentioned above.

Hope this helps.
Title: Re: Need help with a simple batch script
Post by: patio on April 02, 2018, 08:49:54 AM
PDF's can be converted into Excel forms...

Or CSV...though i don't know if CSV's would be easier...
Title: Re: Need help with a simple batch script
Post by: hhaji on April 12, 2018, 02:36:01 PM
I have the standard Acrobat and I can turn a fully pictured pdf into an ugly excel file.  But since all these CMM pdf files were generated the same way, I can always go to the same row 35 and column 12 and extract the needed data from them.

Hope this helps.

I am sorry hasn't replied lately, got into another fire fighting project.  But I am back.

Appreciate any help and guidance.
Title: Re: Need help with a simple batch script
Post by: Geek-9pm on April 12, 2018, 06:37:31 PM
This is not an answer.
This is a word of caution.

There is a very string misconception with data generated by a computer.
Many layman, even professionals, bveli9ve data from computer is perfect.
Of course, that is not true. There is an old saying:
"Garbage in, Garbage out."
Automatic conversion of numerical from one format to another can lead to very harmful errors. Whit text, we humans often spot nonsense.That is a safegaurd that prevents use from publishing garbage.

We numerical data we humans do not know if the data is correct or not. When a report or a graphic is the result of Excel numerical data, people tend to believe it is flawless.

There have been some very bad financial disasters from bad data in spreadsheets.
https://www.telegraph.co.uk/finance/newsbysector/banksandfinance/11518242/Stupid-errors-in-spreadsheets-could-lead-to-Britains-next-corporate-disaster.html
Quote
Almost one in five large businesses have suffered financial losses as a result of errors in spreadsheets, according to F1F9, which provides financial modelling and business forecasting to blue chips firms. It warns of looming financial disasters as 71pc of large British business always use spreadsheets for key financial decisions.
and...
https://www.nytimes.com/2013/04/19/opinion/krugman-the-excel-depression.html

Some has said that anybody who works with spreadsheets needs to be carefully vested as to sanity and soundness of mind.

Just a warning.  ???

 

Title: Re: Need help with a simple batch script
Post by: nil on April 12, 2018, 06:49:13 PM
Will it help if you can convert the PDF to TXT file on the command line? If so, you can use Ghostscript to do this

https://www.ghostscript.com/download/

For example to create an output file pdf-output.txt from input file pdf-sample.pdf

Code: [Select]
gswin64c -sDEVICE=txtwrite -o pdf-output.txt pdf-sample.pdf