Saturday, March 01, 2008

SPAM on Usenet



From Wikipedia, the free encyclopedia:
"Spamming is the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages. While the most widely recognized form of spam is e-mail spam, the term is applied to similar abuses in other media: instant messaging spam, Usenet newsgroup spam, Web search engine spam, spam in blogs, wiki spam, mobile phone messaging spam, Internet forum spam and junk fax transmissions.

Spamming is economically viable because advertisers have no operating costs beyond the management of their mailing lists, and it is difficult to hold senders accountable for their mass mailings. Because the barrier to entry is so low, spammers are numerous, and the volume of unsolicited mail has become very high. The costs, such as lost productivity and fraud, are borne by the public and by Internet service providers, which have been forced to add extra capacity to cope with the deluge. Spamming is widely reviled, and has been the subject of legislation in many jurisdictions."
Spam affects about everybody that uses the Internet in one form or another. And in spite of what Bill Gates forecasted in 2004, when he said that "spam will soon be a thing of the past", it is getting worse by the day. While the European Union's Internal Market Commission estimated in 2001 that "junk e-mail" cost Internet users €10 billion per year worldwide, the California legislature found that spam cost United States organizations alone more than $13 billion in 2007, including lost productivity and the additional equipment, software, and manpower needed to combat the problem.

Where does all that Spam come from? Experts from SophosLabs (a developer and vendor of security software and hardware) have analyzed spam messages caught by companies involved in the Sophos global spam monitoring network and came out with a list of top 12 countries that spread spam around the globe:
  • USA - 28.4%;
  • South Korea - 5.2%;
  • China (including Hong Kong) - 4.9%;
  • Russia - 4.4%;
  • Brazil - 3.7%;
  • France - 3.6%;
  • Germany - 3.4%;
  • Turkey - 3.%;
  • Poland - 2.7%;
  • Great Britain - 2.4%;
  • Romania - 2.3%;
  • Mexico - 1.9%;
  • Other countries - 33.9%

There are many types of electronic spam, including E-mail spam (unsolicited e-mail), Mobile phone spam (unsolicited text messages, Messaging spam ("SPIM"), use of instant messenger services for advertisement or even extortion, Spam in blogs ("BLAM"), posting random comments or promoting commercial services to blogs, wikis, guestbooks, Forum spam (posting advertisements or useless posts on a forum, Spamdexing, manipulating a search engine to create the illusion of popularity for webpages, Newsgroup spam, advertisement and forgery on newsgroups, etc.

For the purpose of this post we shall focus on Newsgroups spam, the type of spam where the targets are Usenet newsgroups.
Usenet convention defines spamming as excessive multiple posting, that is, the repeated posting of a message (or substantially similar messages). During the early 1990s there was substantial controversy among Usenet system administrators (news admins) over the use of cancel messages to control spam. A cancel message is a directive to news servers to delete a posting, causing it to be inaccessible to those who might read it.
Some regarded this as a bad precedent, leaning towards censorship, while others considered it a proper use of the available tools to control the growing spam problem.
A culture of neutrality towards content precluded defining spam on the basis of advertisement or commercial solicitations. The word "spam" was usually taken to mean excessive multiple posting (EMP), and other neologisms were coined for other abuses — such as "velveeta" (from the processed cheese product) for excessive cross-posting.
A subset of spam was deemed cancellable spam, for which it is considered justified to issue third-party cancel messages.

The Breidbart Index (BI), developed by Seth Breidbart, provides a measure of severity of newsgroup spam by calculating the breadth of any multi-posting, cross-posting, or combination of the two. BI is defined as the sum of the square roots of how many newsgroups each article was posted to. If that number approaches 20, then the posts will probably be cancelled by somebody.


The use of the BI and spam-detection software has led to Usenet being policed by anti-spam volunteers, who purge newsgroups of spam by sending cancels and filtering it out on the way into servers.

A related form of Newsgroups spam is forum spam. It usually consists of links, with the dual goals of increasing search engine visibility in highly competitive areas such as sexual invigoration, weight loss, pharmaceuticals, gambling, pornography, real estate or loans, and generating more traffic for these commercial websites.
Spam posts may contain anything from a single link, to dozens of links. Text content is minimal, usually innocuous and unrelated to the forum's topic. Full banner advertisements have also been reported.
Alternatively, the spam links are posted in the user's signature,where is more likely to be approved by forum administrators and moderators.
Spam can also be described as posts that have no relevance to the threads topic, or have no purpose in general (e.i, a user typing "CABBAGES!" or other such useless posts in an important news thread).

When Google bought the Usenet archives in 2001, it provided a web interface to text groups (thus turning them into some kind of web forums) through Google Groups, from which more than 800 million messages dating back to 1981 can be accessed.
There are some especially memorable articles and threads in these archives, such as Tim Berners-Lee's announcement of what became the World Wide Web:
http://groups.google.com/groups?selm=6487%40cernvax.cern.ch
or Linus Torvalds' post about his "pet project":
http://groups.google.com/groups?selm=1991Oct5.054106.4647%40klaava.Helsinki.FI
You can view a pick of the most relevant posts here:
http://www.google.com/googlegroups/archive_announce_20.html

But Google Groups are responsible for the higher proportion of the spam that floods the Usenet nowadays. Google Groups isn't the only source, but is the one that makes it easier for spammers to carry out their irritating activities.
It's so easy to spam Usenet through Google Groups that there are some infamous spammers who have been doing so for years. Perhaps the best known of all is the MI-5 Persecution spammer who gets his way across just about any other newsgroup with rambling postings that often appear as clusters of 20 or more messages all related to Mike Corley's perceived persecution of himself by MI5, the British intelligence agency. This UK-based spammer readily admits that he suffers from mental illness in several of his postings. He annoys the rest of users in such an exasperating way, that some of them have even offered themselves to the MI-5 to personally finish off the job.

The solution, IMHO, is to implement the Breidbart Index in Google Groups. It would be an easy task for a company that excels at implementing all kinds of algorithms in their search engine, that I just can't understand what are they waiting for.



More Info:
Newsgroup Spam

No comments: