Data compression Question?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tec0
    Diamond Member

    • Jun 2009
    • 4624

    #16
    Well don’t blame me of the lingo, some sites state compression some sites state encoding most of them it comes down to encoding but it is what it is. Do a Google on “Video Compression” and with “exception” you will find that most sites will have converters.
    peace is a state of mind
    Disclaimer: everything written by me can be considered as fictional.

    Comment

    • irneb
      Gold Member

      • Apr 2007
      • 625

      #17
      Originally posted by AndyD
      This wouldn't be compressing, it would be encoding.
      I'd say it's more of a semantic difference, but I understand your point. In general lossy is still referred to as compression (even though it's actually a re-encoding). The problem is that lossy is only practical on certain types of data (i.e. video, sound, fractals, etc.). Whenever the data / code needs to be character perfect you cannot use lossy at all - you need a perfect recreation.

      Regarding the ISO with duplicate files, I know some writer software allows you to create several entries into the allocation table which simply points to the same spot in the image / disc. That still works when then read from a mounted image / inserted CD/DVD (don't know if it breaks the ISO spec though). But if such is used it wouldn't cause this strangeness of an ISO being a lot smaller compressed originally than you can compress it afterwards. The dummy file idea might be another possibility which is happening here, i.e. the EXE you've downloaded generates the ISO image but includes some dummy files with random content - so the EXE doesn't actually have that data stored, it just puts garbage into those files (which may simply be whatever was on the blank portion of your HDD at the time). The loss-less compressor then reads that portion as a file that needs to be replicated exactly, so doesn't throw away the garbage.

      The AVI/MPEG/MP4/OGM/MKV/WMV/etc. file is just a container for data streams (in particular video and sound). Inside it would contain the data encoded by a codec (XVid / DivX / Mpeg / UncompressedRGB / HuffYUV / H263 / Mpeg4 / WMV9 / QPeg / X264 / etc.) These could be lossy / loss-less - but none of the compression packages (WinZip / WinRAR / 7Zip / etc.) actually use any of these codecs. They would ALWAYS use a loss-less compression algorithm. Some work quite nicely with media files (e.g. Rar & 7Zip can compress an AVI with Uncompressed RGB video much better than most loss-less codecs could) - strange but I've seen it happen. Usually these container files are kept as efficiently encoded as possible, thus using RAR/ZIP/7Z on them after they've been encoded has little effect (compressing only that which they can see as a pattern / duplication).

      BTW, that's the main method of these compressors. They compare consecutive portions of the data to see if there's any duplication or some sort of pattern which repeats. Usually they create a dictionary of sections to work out duplication, then simply stores which item in the dictionary goes where in the file. That way instead of storing the data numerous times it stores the data once and places a pointer numerous times. Unfortunately the dictionary has a limit due to RAM & Speed, so every now and again a dictionary would be flushed and started anew. With ZIP, this is the main reason it's not a great compressor - the maximum dictionary size is rather small compared to RAR / 7Z. E.g. if you use ZIP the dictionary is 32KB, but with 7Z if can be 64KB all the way up to 64MB. This is also used in video encoding, usually those codecs have a "key-frame" every preset amount of frames / time which gets stored as one full frame - the following frames would only store the differences from that frame. Similar with sound, but to explain you would need to understand how sound is encoded even in an uncompressed WAV file (so I'm not going through all that - it wouldn't add anything to the discussion about compression).

      Then the "word" size also has an effect. This is the portion inside the dictionary which can be repeated in the code. E.g. ZIP can have words from 8-258 bytes, 7Z 8-273. In the codecs this can be seen as each pixel in the frame. If the pixel is the same as that of the key-frame, then it just gets pointed to it (or even left off in some codecs). If it's different it gets stored as usual (or using a form of pattern prediction - see later).

      Then there's what's known as "solid" archiving. RAR / 7Z usually uses this, ZIP doesn't have this facility. Basically it means that the dictionary is shared across multiple files. With ZIP each file compressed has at least 1 dictionary of its own. This can make for a lot better compression if you're compressing several similar files into one archive. This is rarely used in media encoding, but could be possible if the container file has multiple streams (e.g. VOB files on DVDs could have multiple video streams, several audio streams and a whole bunch of subtitle streams). Usually it's not used in media since the streams are seldom similar enough to effect much compression.

      And then lastly there's the near wizardry of pattern prediction. This is a lot closer to those loss-less codecs, but combined with the above ideas. I.e. a word in the dictionary would be compared to a portion in the file, then seeing as it's "almost" the same the word's index is stored with a code to describe the difference. This is usually the main portion adjusted through the "compression level" - a higher level will allow for a larger difference. Usually it doesn't help all that much when compressing normal code / data - but it usually makes compression and de-compression a lot slower. In video / sound the compression ratio becomes a lot better since a colour could be slightly off another or a frequency only a touch different. So here it gets used quite often especially in lossy codecs, but even some loss-less codecs use this since it need not be lossy.

      Things that are never used in loss-less is when exactness is thrown away. E.g. sound frequencies which most human ears can't hear is thrown away when encoding in MP3s. Some differences in shade in video can be ignored since the human eye can't pick-up such. These are the main aspects which makes a "compression" lossy. But also things like resolutions, frame-rate, sampling rate, etc. can be adjusted since a human can't pick-up the differences (e.g. try to notice the difference between 30fps and 25fps, or 1080P and 720P on a 40" CRT screen, or 48000Hz and 44100Hz on a MP3 player). These can be considered to be re-encoding instead of compression.
      Gold is the money of kings; silver is the money of gentlemen; barter is the money of peasants; but debt is the money of slaves. - Norm Franz
      And central banks are the slave clearing houses

      Comment

      • AndyD
        Diamond Member

        • Jan 2010
        • 4946

        #18
        Sorry guys, I wasn't trying to get into an argument about semantics, I was just getting confused about the processes being discussed.
        _______________________________________________

        _______________________________________________

        Comment

        • tec0
          Diamond Member

          • Jun 2009
          • 4624

          #19
          Originally posted by AndyD
          Sorry guys, I wasn't trying to get into an argument about semantics, I was just getting confused about the processes being discussed.
          Need not worry, about it, lingo is sometimes confusing and with me at the keyboard doesn’t help matters at all... Sometimes I think the only thing that understands me is my african grey
          peace is a state of mind
          Disclaimer: everything written by me can be considered as fictional.

          Comment

          • Sparks
            Gold Member

            • Dec 2009
            • 909

            #20
            Are you saying that if I get an African Grey I will final be understood?

            Comment

            • tec0
              Diamond Member

              • Jun 2009
              • 4624

              #21
              Originally posted by Sparks
              Are you saying that if I get an African Grey I will final be understood?
              Probably not... but they do need a lot of attention and they learn very quickly “mine only took 14 years to train” and it is still learning new things if you keep them company. They are very social and fun to have around and if you “teach them right” They will always agree with you on everything “as long as you have treat” But don’t get one if you don’t have a lot of spare-time on your hands they actually do commit suicide if they don’t get constant attention.

              That said pot plants “not weed aka marijuana” is also very understanding....
              peace is a state of mind
              Disclaimer: everything written by me can be considered as fictional.

              Comment

              Working...