gzip Is Really Good at, Like, Compressing Repetitive Text and Stuff

A post in which Ethan reassures you that gzip isn't scary, and reminds you to prove junk you say.

I hear it a lot. I say it a lot.

"It's a lot of code, but gzip will shrink it down. gzip is really good at, like, compressing repetitive text and stuff."

Whenever you hear this (or any similar phrase), perk up your ears. An assumption is being made here. The statement is true. gzip is, in fact, good at compressing repetitive text. But that's not to say you're allowed to use it as a "get out of jail free" card.

"It's a lot of code, but gzip will handle this well," is a pretty qualitative thing to say. It'd be more useful if we could say something more quantitative like, "It's a lot of code, but after gzipping it, it's only 0.5 KB!" You know it's better because it sounds like something a scientist would say. And getting the information to prove your assumption isn't nearly as difficult as you think.

You're Good Enough, You're Smart Enough, and Doggone It, People Like You

I always thought of gzip as a thing that I couldn't quite fully grasp, so I decided to think about other things. I couldn't understand it fully, so why even bother? I'd let people who are smarter than me think about those things. People always throw around sizes of gzipped files like it's nothing. This was always a little intimidating and mysterious to me.

"Where does that number come from? Can I measure it? Do I need to use a server? Or my browser? Do I need to edit an .htaccess file? What's an .htaccess file? I hope I don't need to edit an .htaccess file." Panicked questions. Downward spiral. No bueno. This is the part where I start thinking about other things.

The good news is that you don't have to understand it. I don't understand it. I don't understand it, and I'm writing a blog post about it.

Here's what we need to know about gzip:

  • gzip is a way to compress files.
  • Servers can gzip files before sending them to browsers.
  • Browsers can uncompress gzipped files.
  • Using gzip means fewer bytes being sent over the wire.
  • gzip is really good at, like, compressing repetitive text and stuff.
  • You can validate the "gzip's got my back" assumption. If you're on a Mac (or Linux with gzip installed) using this command:
gzip -c path/to/file | wc -c

This outputs the number of bytes a file would be if it were gzipped. All this is doing is gzipping a file, then sending it to the wc utility, which can count the number of bytes of the text you send it. If you're on Windows or you don't want to use the command line, you can still find gzipped file sizes pretty easily.

And that's enough to get on a path to prove gzip's value in a more tangible way.

Put Up or Shut Up

If you can check gzipped file sizes, then you can make comparisons. You can perform experiments and do science. Next time you hear somebody assume that gzip will shrink something down effectively, test it!

  1. Get gzipped size.
  2. Make change.
  3. Get gzipped size again.
  4. Compare sizes.

That "make change" step works beautifully with version control. You can compare the size of a file on your master branch to the size of that same file on a fancy-new-feature branch. Or a hardcore-refactor branch. You can see exactly how many (or few!) bytes you'll be sending over the wire to users after implementing a feature. Or refactoring code. You can use hard numbers to help decide if a feature is worth the bytes. You can see if mixins are better than extends on a case-by-case basis. That's potent information.

Keep in mind that if you're minifying your assets (which is encouraged!), you'll get more accurate numbers by minifying your code before gzipping it. And yes, there's still a benefit to gzipping assets even if you're minifying them.

Enabling gzip on the Server

gzip may not be enabled on your server by default. Don't freak out if it's not. If you're using Apache, you can copy and paste this code into your .htaccess or httpd.conf file:

<IfModule mod_mime.c>
 AddType application/x-javascript .js
 AddType text/css .css
</IfModule>
<IfModule mod_deflate.c>
 AddOutputFilterByType DEFLATE text/css application/x-javascript text/x-component text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/javascript
 <IfModule mod_setenvif.c>
  BrowserMatch ^Mozilla/4 gzip-only-text/html
  BrowserMatch ^Mozilla/4\.0[678] no-gzip
  BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
 </IfModule>
 <IfModule mod_headers.c>
  Header append Vary User-Agent env=!dont-vary
 </IfModule>
</IfModule>

Source: How To *NIX

If you're using a different server, enabling gzip is probably trivial. Give it a quick Google search. I believe in you.

You have no excuse now. You'll think of this goofy post every time somebody says "gzip will handle it." You know gzip isn't scary and you're smart enough to use it. You know it's easy to check gzipped file sizes. You know how to validate your assumptions. You scientist, you.