I was kind of planning on working through NeHe's tutorials but after blazing through the first 10 I realized I hadn't really forgotten that much and actually had a pretty good idea of what I was doing. Still, it was a nostalgic trip.
I decided that I really needed anti-aliasing. And not just some generic MSAA, I needed CSAA ("Coverage Sample Anti-Aliasing") an extension available on NVidia's GeForce 8 cards. NVidia has a page with details (also GL_NV_framebuffer_multisample_coverage), but basically they claim that it "rivals the quality of 8x or 16x MSAA" while having overhead comparable to 4x MSAA. The screenshots at the bottom of their write-up seem realistic basic on my test application so I won't bother (re-)posting them here, but it looks pretty nice.
They also mention that by "decoupling coverage from color/z/stencil" it's extremely efficient in terms of bandwidth and storage costs. They provide some information about the number of samples taken (at 8x CSAA; 4 color/z/stencil and 8 coverage samples), but there isn't a lot of details about storage requirements. I think I've figured out how to measure GPU computational overhead with reasonable accuracy (GL_EXT_timer_query) so I can partially check on their claims of how it compares to 4x MSAA, but I'd like some way to get a better idea about CPU<->GPU bandwidth and VRAM usage. NVidia's PerfHUD may be the answer to my prayers. I'll take a look as soon as I get the chance.
The storage requirements in particular worry me because the storage cost of 4x or even 2x multi-sampling at resolutions around HD is already quite high (despite not looking all that great), and you run the risk of having to move things to main memory and performance suffering.