Computer Hope

Software => Computer software => Topic started by: Whitebeard1 on November 13, 2013, 03:29:58 AM

Title: 7-zip Cannot Compress
Post by: Whitebeard1 on November 13, 2013, 03:29:58 AM
I have got the first billion digits of pi in .txt format. It is 1.5 GB big. I'm planning on sharing it on the internet, but I don't want to put a 1.5 GB text file up there. So, I tried to compress it using 7-zip. Unfortunately, 7-zip failed to compress it...it came up with a message: "system cannot allocate the required amount of memory."
Is there any way to fix this problem? Any help is appreciated. Thank you!
Title: Re: 7-zip Cannot Compress
Post by: PCdoc on November 13, 2013, 05:42:48 AM
Seems to be a common problem faced by many people. Found a link (http://sourceforge.net/p/sevenzip/discussion/45797/thread/dbc813cd) you might wanna see.
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 13, 2013, 11:37:30 AM
Among compression tools, 7-zip is unusually hard on memory. Anyhow, why bother, when it is already being shared? You are only going to hammer your upload and web storage when a big research place has already done it. Also, if I want 1 billion places of pi, I (presumably) want to be able to trust they are the right ones, (OK I know 3.141592653 already) so where do I go, MIT or some guy on the web I never heard of?

Uncompressed (1 GB) at MIT

(Folks, don't click on this link in a browser unless you have a lot of time on your hands... right click and choose Save as...)

http://stuff.mit.edu/afs/sipb/contrib/pi/pi-billion.txt

7-zip compressed (490 MB)

http://micronetsoftware.com/pi_day/pi/pi.7z

I have downloaded the uncompressed text and using WinRAR (64-bit version) "best" compression setting I compressed it in 13 minutes and got a smaller file size than the 7-zip archive at Micronet...

     1,000,000,002 pi-billion.txt
       514,753,983 pi.7z
       434,964,971 pi-billion.rar


If I had to use 7-zip I would probably use some method of splitting the original file into chunks, or maybe if you ask 7-zip to split the archive you can reduce the memory hit?

Your figure of 1.5 GB for the plain text file seems a bit big.
Title: Re: 7-zip Cannot Compress
Post by: Geek-9pm on November 13, 2013, 12:40:58 PM
Using a very simple method you can reduce a billion digits to about 500 million bytes with little effort. A single decimal digit only needs 4 bits to represent its value. If reduced to binary as a very looooong integer would take even less. About 333 million bytes.  Forget the decimal point. We all know the first three digits are 3.14 anyway.
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 13, 2013, 12:54:38 PM
At the moment, I am trying to compress the 1-billion-place text file using 7-zip "ultra" compression, splitting the archive into 100 MB chunks. It estimates 35 minutes to completion, and Process Explorer's memory figure ("working set") has stabilised at 682 MB. I can imagine 7-zip would be not the compression tool of choice if RAM was limited. WinRAR never went over 300 MB.
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 13, 2013, 01:05:54 PM
Time to complete has gone up to 48 minutes... [EDIT] it took 49 minutes... 4 files created

100,000,000 pi-billion.7z.001
100,000,000 pi-billion.7z.002
100,000,000 pi-billion.7z.003
100,000,000 pi-billion.7z.004
 40,941,343 pi-billion.7z.005
Title: Re: 7-zip Cannot Compress
Post by: Geek-9pm on November 13, 2013, 11:07:20 PM
Is  the nonobjective speed or size?
If you want speed and size, do it in machine code (assembly) by hand. This is a one-time thing. Right? so you make a specific ASM program t o do it once. The output will be a self-extracting EXE file that prints to the output device.
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 14, 2013, 12:01:55 AM
Is  the nonobjective speed or size?

The objective was reduced size, for web sharing. Time taken to compress is not, I think, an issue.
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 14, 2013, 01:14:57 PM
This approach might be fun:

With curl.exe http://curl.haxx.se/ you can download a range of pi digits so if you want to examine them you don't even need the billion digit file to be stored locally.

Store the first 10 digits of Pi in pi-10-digits.txt (the start character is 2 because the first character of the file is the decimal point)

curl -o pi-10-digits.txt -r 2-11 http://stuff.mit.edu/afs/sipb/contrib/pi/pi-billion.txt

result:

1415926535

Store the 50 digits of Pi starting at the 70th digit in pi-50-digits-from70.txt

curl -o pi-50-digits-from-70.txt -r 71-121 http://stuff.mit.edu/afs/sipb/contrib/pi/pi-billion.txt

result:

406286208998628034825342117067982148086513282306647



Title: Re: 7-zip Cannot Compress
Post by: Geek-9pm on November 14, 2013, 03:15:51 PM
Salmon Trout, you are beyond brilliant!
How do you find such things?
Are you riving inside of Google?
Title: Re: 7-zip Cannot Compress
Post by: Whitebeard1 on November 16, 2013, 12:27:18 AM
Came back, and saw all these great, helpful replies.....thanks very much. :) And Salmon Trout, the curl.exe was very interesting! Thanks!
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 16, 2013, 06:39:48 AM
You can use curl.exe to get a range of bytes (or in this case, characters) from a local file. You have to convert the file path to a file url starting file:/// and using forward slashes instead of backslashes:

From the local file D:\Pi\pi-billion.txt, create a text file with the first 10 digits of Pi (the start character is 2 because the first character of the file is the decimal point):

curl -o pi-10-digits.txt -r 2-11 file:///d:/pi/pi-billion.txt

Thus you see that curl.exe can be used as a local file splitter.


Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 17, 2013, 12:22:28 PM
I found a pi calculator program for Windows by Fabrice Bellard http://www.bellard.org/pi/pi2700e9/tpi-0.9.3-win.zip

I calculated the first billion places of Pi on my home PC (AMD Phenom II 945, 4GB RAM, Windows 7 64 bit) and it took 1 hour 4 minutes. I chose to limit RAM usage to 1 GB and also to store some of the intermediate results on disk rather than keep them in RAM. If I had gone for the all-RAM option it would have been a lot quicker but the PC would have been quite slow at doing other tasks.

I compared the first million digits of my calculation with the first million of the MIT billion digit file and they are the same.
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 17, 2013, 01:27:11 PM
Next: compare the full billion. Of course it won't prove that either set is "right"...
Title: Re: 7-zip Cannot Compress
Post by: patio on November 17, 2013, 02:18:00 PM
Pi' are round...
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 17, 2013, 02:23:09 PM
Pi' are round...

If you used pi to a billion places to calculate the circumference of the observable universe, the possible error would be less than an atom's width (I read somewhere). Hard to see what practical use such a number of digits would have, apart from the (definitely useful) purpose of developing computing algorithms etc.
Title: Re: 7-zip Cannot Compress
Post by: patio on November 17, 2013, 02:26:26 PM
If you used it to calculate a 2 inch circle the results would be the same...
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 17, 2013, 02:47:05 PM
If you used it to calculate a 2 inch circle the results would be the same...

The error would be quite a bit less than an atom's width; less maybe than than the Planck length? I'll have to do some arithmetic... see here http://www.trans4mind.com/personal_development/JavaScript/longnumAstronomical.htm

I once read a science fiction novel in which some scientists got a message from aliens telling them to calculate pi to some large number of digits and when they did so they found a message from the creators of the universe (some much older aliens)
Title: Re: 7-zip Cannot Compress
Post by: patio on November 17, 2013, 02:49:07 PM
The atom's size would be the same no matter the circumference since the circle and the formula are infinite...
Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 17, 2013, 03:56:35 PM
The atom's size would be the same no matter the circumference since the circle and the formula are infinite...

I am not sure what you mean here! A circle, whether the radius is 2 inches or 150 million light years, is not infinite in any way that I understand the meaning of that word.

This is what I am saying: A 2 inch circle has a (finite) circumference. It can be approximated to by C = π x D. If you take a value of π with no decimal places (3) then the circumference comes out as 6 inches. Let us take values of π with increasing numbers of decimal places:

The circumference becomes (inches):

6
6.2
6.28
6.282
6.283
6.28318
6.283184
6.2831852
6.2831853
6.283185302

We are getting nearer all the time to the actual circumference (the error is getting less and less). Now if we used pi to 39 decimal places the difference between the actual diameter and the calculated figure would be very small. If the circle was 20 billion light years across, the error would be around the size of a proton (2 x 10 to the power of -15) metres. It follows that the error for a 2 inch circle would be less, in the same proportion as the ratio of 2 inches to 20 billion light years.






Title: Re: 7-zip Cannot Compress
Post by: Salmon Trout on November 19, 2013, 02:09:04 PM
.