Whenever I’m upset about the world, and where things are going, I can always find solace in Nine Inch Nails. Laugh at my gothic tragicness if you must, fools, but you’re the ones missing out. Leaving Hope is a brilliant, beautiful piece of work. My love is an unashamed love.
NIN.com has undergone renovations recently, including a fantastic low-tech redesign (read: I wish I had thought of it first), a near-complete discography, studio updates, and a suprisingly candid question and answer section.
I spent some time today pouring over various text-analysis papers and theories, trying to determine what the best solution was to my spam problem. It is a very large problem, and I fear that the best solution is to change my email address.
However, I did learn a lot about Apple’s mail.app functions, and how it handles spam.
I’m currently using a combination of methods to defeat The Spam, but mostly I’m using a form of bayesian filtering. Bayesian filtering works as such: Every word (or string) in every email is assigned a score, based on the probability of that word being used in spam mail. The score is totalled, and if it is larger than a predefined value, then the message is tagged as spam.
What mail.app does is this: The occurance of each word (or string) is counted, and stored in a table. Each message is assigned a point in N-dimensional space, where each axis is a word (or string) and an email’s position within the axis is how often a given string appears in an email. You can then do dimensionality reduction on this space, collapsing dimensions which are noisy, useless, or redundant.
By eliminating dimensions in this way, you can bypass a lot of spam trickery. Putting unique words in spam (amoral legendary chrysolite, anyone?) doesn’t help the message’s chances of passing through the filter, etc., etc. Just throw away the bigger common null subspace in the word matrices, do a little massaging, and, voila! Latent semantic analysis!
(Or, if you prefer: Magic!)
GMail appears to have (accidentally, I assume) upped my storage limit to one terabyte from one gigabyte. If anyone had any 1000000MB files they needed to email me, now’s the time.