<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Source Web Thoughts &#187; architecture</title>
	<atom:link href="http://blog.dewaldbotha.co.za/category/architecture/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.dewaldbotha.co.za</link>
	<description></description>
	<lastBuildDate>Wed, 08 Apr 2009 11:17:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>the down-low on mobile device detection</title>
		<link>http://blog.dewaldbotha.co.za/2009/02/19/the-down-low-on-mobile-device-detection/</link>
		<comments>http://blog.dewaldbotha.co.za/2009/02/19/the-down-low-on-mobile-device-detection/#comments</comments>
		<pubDate>Thu, 19 Feb 2009 14:28:09 +0000</pubDate>
		<dc:creator>dewaldbotha</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[devicealtas]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mobile device detection]]></category>
		<category><![CDATA[php alternative cache]]></category>
		<category><![CDATA[php apc]]></category>
		<category><![CDATA[wurfl]]></category>

		<guid isPermaLink="false">http://blog.dewaldbotha.co.za/2009-02-19/the-down-low-on-mobile-device-detection/</guid>
		<description><![CDATA[so &#8211; you say you want to detect which mobile devices hit your site? &#8211; in the past, this has been a bit of an issue, but lately &#8211; with really nice projects available out there such as WURFL or DeviceAtlas, you are able to concentrate harder on other issues, instead of having to write <a href="http://blog.dewaldbotha.co.za/2009/02/19/the-down-low-on-mobile-device-detection/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>so &#8211; you say you want to detect which mobile devices hit your site? &#8211; in the past, this has been a bit of an issue, but lately &#8211; with really nice projects available out there such as <a href="http://wurfl.sourceforge.net/" title="WURFL" target="_blank">WURFL</a> or <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a>, you are able to concentrate harder on other issues, instead of having to write a complete library of your own.</p>
<p>so for this, i&#8217;ve decided on <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a>.  just head on over to <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a> and open a developers account &#8211; you will get a one year developer&#8217;s license to play around to see how cool it is.</p>
<p>after registering &#8211; click on the <a href="http://deviceatlas.com/downloads" title="download DeviceAtlas" target="_blank">downloads</a> link &#8211; then go to the php example and download the source files.  after de-tar-and-un-zipping the file, just extract the contents.  at the moment all we are really interested in, is the <em><strong>Mobi/Mtld/DA </strong></em>directory and its contents &#8211; also create a <em><strong>json</strong></em> directory and drop the <em><strong>Sample.jso</strong></em>n file inside.</p>
<p>dump everything into a web directory somewhere. <img src='http://blog.dewaldbotha.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><span id="more-16"></span>create a <em><strong>deviceatlas.php</strong></em> file in the working directory root and put the following inside.</p>
<pre lang="php">//includes the DeviceAtlas API

include 'Mobi/Mtld/DA/Api.php';//fetch the previously populated apc cache instance into a variable

$tree = apc_fetch('tree');

//Get the properties from a variable containing a JSON tree loaded from cache

$properties = Mobi_Mtld_DA_Api::getProperties($tree, $_SERVER['HTTP_USER_AGENT']);

//Output the phone properties for your viewing pleasure

print_r($properties);</pre>
<p>&#8216;hhmmm, that&#8217;s it?&#8217;, you might ask, and if you were clever you would have noticed something strange in the code above.</p>
<p>yip, that is pretty much it &#8211; but you also need caching.  the <em><strong>apc_fetch(&#8216;tree&#8217;)</strong></em> command in the code above actually fetches the json file already dumped into a &#8216;tree&#8217; cache instance in <a href="http://www.php.net/manual/en/intro.apc.php" title="PHP APC Cache" target="_blank">PHP APC</a> (Alternative PHP Cache).</p>
<p>why <a href="http://www.php.net/manual/en/intro.apc.php" title="PHP APC Cache" target="_blank">PHP APC</a>, you may also ask.  well, besides using the overstated <a href="http://www.danga.com/memcached/" title="Memcached" target="_blank">memcached</a> (<a href="http://www.phpbuilder.com/board/archive/index.php/t-10346692.html" title="Memcache vs APC" target="_blank">you might want to read this</a>), there are some methods to the madness.  with <a href="http://www.php.net/manual/en/intro.apc.php" title="PHP APC Cache" target="_blank">PHP APC</a>, the cache is shared on a server.  it might not be accessible from other servers, but this is not really a con in my mind.</p>
<p>if you wish to scale your application later one and add another server, there is going to be some issues, you might say &#8211; but there will also be issues if you have millions of users hit one central instance of <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a>.  so, having one instance of <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a> per server might not be too much of an issue.  also when it comes to updating the json file, you might want to consider the <a href="http://en.wikipedia.org/wiki/Single_point_of_failure" title="Single Point of Failure" target="_blank">single point of failure</a> issue of just having one server.</p>
<p>so have a couple of servers, and just write some scripts to update the single instances of <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a>, rather than updating one, already busy server which still needs a cache update.</p>
<p>but, back to business.</p>
<p>first you have to install <a href="http://www.php.net/manual/en/intro.apc.php" title="PHP APC Cache" target="_blank">PHP APC</a>.  i found <a href="http://www.howtoforge.com/apc-php5-apache2-debian-etch" title="Apc Installation guide" target="_blank">this handy guide</a> on the net, which helped me install it on my <a href="http://en.wikipedia.org/wiki/Debian" title="Debian" target="_blank">debian</a> system and integrate it with <a href="http://en.wikipedia.org/wiki/Apache_server" title="Apache" target="_blank">apache</a> and <a href="http://en.wikipedia.org/wiki/Php5" title="PHP 5" target="_blank">php5</a> &#8211; there is also a guide at the bottom for <a href="http://en.wikipedia.org/wiki/Red_hat" title="Red Hat" target="_blank">fedora</a> based machines, so don&#8217;t fret my <a href="http://en.wikipedia.org/wiki/Linux" title="Linux" target="_blank">linux</a> loving friends <img src='http://blog.dewaldbotha.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>you can create a php script to run the following &#8211; which should cache your <a href="http://deviceatlas.com/" title="DeviceAtlas" target="_blank">DeviceAtlas</a> tree into <a href="http://www.php.net/manual/en/intro.apc.php" title="PHP APC Cache" target="_blank">PHP APC</a>.</p>
<pre lang="php">//ads the sample json data from DeviceAtlas

apc_add('tree', Mobi_Mtld_DA_Api::getTreeFromFile("json/Sample.json"));</pre>
<p>from the first code example, the output might look something like this (depending on your phone)</p>
<p>Array<br />
(<br />
[wmv] =&gt; 0<br />
[vendor] =&gt; Nokia<br />
[mobileDevice] =&gt; 1<br />
[memoryLimitMarkup] =&gt; 357000<br />
[memoryLimitDownload] =&gt; 61440<br />
[midiMonophonic] =&gt; 1<br />
[midiPolyphonic] =&gt; 1<br />
[mpeg4] =&gt; 1<br />
[3gpp] =&gt; 1<br />
[drmOmaForwardLock] =&gt; 1<br />
[drmOmaCombinedDelivery] =&gt; 1<br />
[drmOmaSeparateDelivery] =&gt; 1<br />
[markup.xhtmlMp10] =&gt; 1<br />
[markup.xhtmlBasic10] =&gt; 1<br />
[image.Jpg] =&gt; 1<br />
[image.Png] =&gt; 1<br />
[amr] =&gt; 1<br />
[mp3] =&gt; 1<br />
[aac] =&gt; 1<br />
[h263Type0InVideo] =&gt; 1<br />
[gprs] =&gt; 1<br />
[edge] =&gt; 1<br />
[image.Gif87] =&gt; 1<br />
[uriSchemeTel] =&gt; 0<br />
[qcelp] =&gt; 0<br />
[hsdpa] =&gt; 0<br />
[amrInVideo] =&gt; 1<br />
[3gpp2] =&gt; 0<br />
[displayColorDepth] =&gt; 18<br />
[model] =&gt; N70<br />
[https] =&gt; 1<br />
[image.Gif89a] =&gt; 0<br />
[umts] =&gt; 0<br />
[displayWidth] =&gt; 176<br />
[displayHeight] =&gt; 208<br />
[markup.xhtmlMp11] =&gt; 0<br />
[markup.xhtmlMp12] =&gt; 0<br />
[csd] =&gt; 0<br />
[hscsd] =&gt; 0<br />
[id] =&gt; 204255<br />
[_matched] =&gt; NokiaN70<br />
[_unmatched] =&gt; -1/5.0609.2.0.1 Series60/2.8 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Link/6.3.1.13.0<br />
)</p>
<p><a href="http://deviceatlas.com/properties" title="DeviceAtlas properties" target="_blank">you can view the attribute explanations here </a></p>
<p>now you can use the phone&#8217;s specific properties to your advantage.  you also might want to look at the <a href="https://addons.mozilla.org/en-US/firefox/addon/59" title="User agent switcher" target="_blank">user agent switcher add-on</a> for firefox for testing purposes.</p>
<p>that&#8217;s it.  now every once in a while you can just update the json file, delete the current cache (<a href="http://www.php.net/manual/en/function.apc-delete.php" title="APC Delete Cache" target="_blank">http://www.php.net/manual/en/function.apc-delete.php</a>) and bob&#8217;s your device detecting, apc caching uncle.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.dewaldbotha.co.za/2009/02/19/the-down-low-on-mobile-device-detection/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>applying game theory patterns to development</title>
		<link>http://blog.dewaldbotha.co.za/2009/02/11/applying-game-theory-patterns-to-development/</link>
		<comments>http://blog.dewaldbotha.co.za/2009/02/11/applying-game-theory-patterns-to-development/#comments</comments>
		<pubDate>Wed, 11 Feb 2009 07:29:23 +0000</pubDate>
		<dc:creator>dewaldbotha</dc:creator>
				<category><![CDATA[architecture]]></category>
		<category><![CDATA[design patterns]]></category>
		<category><![CDATA[mvc]]></category>
		<category><![CDATA[game theory pattern]]></category>
		<category><![CDATA[software architecture]]></category>

		<guid isPermaLink="false">http://blog.dewaldbotha.co.za/2009-02-11/applying-game-theory-patterns-to-development/</guid>
		<description><![CDATA[mobile &#8211; that damned device that makes our life so easy, yet sometimes so inheritely difficult.
as a developer, we kind of try and convince ourselves that developing for mobile and developing for a desktop browser is kind of the same thing.  but we all know that this is a stalling technique for the inevitable, since <a href="http://blog.dewaldbotha.co.za/2009/02/11/applying-game-theory-patterns-to-development/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>mobile &#8211; that damned device that makes our life so easy, yet sometimes so inheritely difficult.</p>
<p>as a developer, we kind of try and convince ourselves that developing for <a href="http://mobiforge.com" title="Mobiforge" target="_blank">mobile</a> and developing for a desktop browser is kind of the same thing.  but we all know that this is a stalling technique for the inevitable, since invariably it becomes a whole different field of play.</p>
<p>every little feature you add, every little flow created and every branch of navigational hierarchy is a challenge on its own.</p>
<p><strong>enter the game theory pattern</strong></p>
<p><span id="more-14"></span>this is was where my idea of a <a href="http://en.wikipedia.org/wiki/Game_theory" title="Game Theory" target="_blank">game theory</a> patterns came into play.  game theory is a branch of applied mathematics, most notably used in social sciences (economics, biology, engineering etc.).  it is also used a bit in computer science, although referred to in that field as artificial intelligence.</p>
<p>thanks <a href="http://wikipedia.org" title="Wikipedia" target="_blank">wikipedia</a> <img src='http://blog.dewaldbotha.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>game theory allows someone to apply a bit of mathematics to capture data from a situation and use that data to build a strategic base, which allows you to make decisions to determine your success.</p>
<p>take for example morning traffic &#8211; you have 2 routes to take to work &#8211; one a bit longer than the other, but the longer route is less likely to be busy.</p>
<p>what do you do?  most of us will take the pro&#8217;s and con&#8217;s of both routes and use that to base our decision upon.</p>
<p>game theory works exactly, well almost, the same.  except you will probably use a bit of math to finalize your decision (average speed, distance etc.)</p>
<p>so why not use game theory in development, especially in a field like mobile, which is very limiting and very frustating at first.  but the good kind of limitation, the kind of limitation that could possibly create innovation.  the guys behind a phenomenon like <a href="http://twitter.com" title="Twitter" target="_blank">twitter</a> decided to limit people to 140 characters.  and despite this limitation, people have found innovative ways to increase the effect of that 140 character limited textual communication.</p>
<p>game theory would allow us to mathematically decide on feature sets, the use of algorythms, objects and could even change the flow of an application.</p>
<p>say on an example mobile application we have 3 major features &#8211; chat, activity, content sharing.</p>
<p>which one do we choose as our killer feature, and which would probably make the application more bloated and more problematic to use.  why not use the main variable in the equation to help you decide &#8211; the user.</p>
<p>i often secretly chuckle to myself, when a system gets developed for one purpose and all the users start using it for a totally different end result, no one really every thought of, or intended for.  you could use this to your advantage with game theory.</p>
<p>a simple case could be made using application flow as an example.  we have a menu with chat, activity feed and shared content.  which one is more important, which one would you make easier to access for the user.  by simply adding a click count to each feature, you could sort a menu by popularity, rather than obvious choice as this will invariably change someday.</p>
<p>if users decide this month, chat is the next best thing, let chat be number one on your menu &#8211; and i know some user interface expert will maintain that consistency is important, but if users consistently choose chat as their most important feature, what would the issue be?</p>
<p>this would change the way you develop and architecture software as well.  since you know chat is gaining in popularity according to your available data, then you should be able to utilize servers better for your chat functionality.  write scaling scripts which would use databases sharding.  and on the other hand, allow your game theory pattern to simplify things &#8211; why use a 40 slave/master instances of a database if all you need is one.  eliminate unnecessary overheads in that way &#8211; more layers makes for more complexity, makes for more overhead, and will eventually kill your application.</p>
<p>i&#8217;ve still got to convince a couple of people to maybe try this game theory pattern approach to development and if, by chance, <a href="http://martinfowler.com/" title="Martin Fowler" target="_blank">martin fowler</a> reads this article:</p>
<p>&#8216;please martin, consider this pattern as a topic for a new book! <img src='http://blog.dewaldbotha.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> &#8217;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.dewaldbotha.co.za/2009/02/11/applying-game-theory-patterns-to-development/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>the great search balancing act</title>
		<link>http://blog.dewaldbotha.co.za/2009/01/14/the-great-search-balancing-act/</link>
		<comments>http://blog.dewaldbotha.co.za/2009/01/14/the-great-search-balancing-act/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 06:29:24 +0000</pubDate>
		<dc:creator>dewaldbotha</dc:creator>
				<category><![CDATA[architecture]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[tomcat]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[haproxy]]></category>
		<category><![CDATA[keepalived]]></category>
		<category><![CDATA[load balancing]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[search engine]]></category>

		<guid isPermaLink="false">http://blog.dewaldbotha.co.za/2009-01-14/the-great-search-balancing-act/</guid>
		<description><![CDATA[it&#8217;s been a while since my last post &#8211; and as interests fade with time, others jump up faster than a beach ball at a nickelback concert.
so i&#8217;ve been looking into solr the last couple of days.  solr is relatively new in the arena and probably outshined a bit in popularity by other search engines <a href="http://blog.dewaldbotha.co.za/2009/01/14/the-great-search-balancing-act/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>it&#8217;s been a while since my last post &#8211; and as interests fade with time, others jump up faster than a beach ball at a nickelback concert.</p>
<p>so i&#8217;ve been looking into <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> the last couple of days.  <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> is relatively new in the arena and probably outshined a bit in popularity by other search engines such as <a href="http://lucene.apache.org/solr/" title="Lucene" target="_blank">lucene</a> and <a href="http://lucene.apache.org/nutch/" title="Nutch" target="_blank">nutch</a>.  &#8220;but why <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a>?&#8221;, you may find yourself asking.</p>
<p>Well <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> has a couple of tricks up the sleave &#8211; which is likely due to the fact that its a fresher version of the old, dare i call it legacy, search engines.</p>
<p><span id="more-9"></span><strong>some of the features of <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> includes:</strong></p>
<ul>
<li>highly scalabe <a href="http://www.java.com/en/" title="Java" target="_blank">java</a> search server (as i will try to show you through this article)</li>
<li>works with the <a href="http://lucene.apache.org/solr/" title="Lucene" target="_blank">lucene</a> search library (tried and tested)</li>
<li>you can update your engine with <a href="http://en.wikipedia.org/wiki/XML" title="XML" target="_blank">xml</a> using a kind of lightweight <a href="http://java.sun.com/developer/technicalArticles/WebServices/restful/" title="Resftul" target="_blank">restful</a> service.</li>
<li>can parse html, openoffice, microsoft office suite documents, pdf&#8217;s etc using <a href="http://lucene.apache.org/solr/" title="Lucene" target="_blank">lucene</a> parsers.</li>
<li>custom tokenizer, filter and analyzer  steps for control over indexing and query processing</li>
<li>extremely rich indexing of fields and metadata, including numbers</li>
<li>can combine fields for fulltext-type searching</li>
<li>tuned for high performance even when updating</li>
<li>spell checkers included, etc, only 2 name a couple of features.</li>
</ul>
<p><a href="http://lucene.apache.org/solr/features.html" title="Solr Features" target="_blank">click here to view a more complete list.</a></p>
<p><a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> currently have api&#8217;s for <a href="http://www.ruby-lang.org/en/" title="Ruby" target="_blank">ruby</a>, <a href="http://www.php.net/" title="PHP" target="_blank">php</a>, <a href="http://www.python.org/" title="Python" target="_blank">python</a>, <a href="http://www.json.org/" title="JSON" target="_blank">json</a>, <a href="http://forrest.apache.org/" title="Forrest" target="_blank">forrest</a>/<a href="http://cocoon.apache.org/" title="Cocoon" target="_blank">cocoon</a>.</p>
<p>the obvious elephant that is missing from the list above is the fact that solr is not compiled with a crawler of some sort.  luckily some friendly open source guys already <a href="http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html" title="Integrate Nutch with Solr as a crawler" target="_blank">released a guide for integrating nutch</a> with <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> for an all round experience.</p>
<p>you might also want to look into <a href="http://www.gissearch.com/localsolr" title="localsolr" target="_blank">localsolr</a> &#8211; which enables geographical and spatial searches with <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a></p>
<h3>now&#8230; where to begin:</h3>
<p>first things first &#8211; since <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> is <a href="http://www.java.com/en/" title="Java" target="_blank">java</a> based, it has to be wrapped in a <a href="http://www.java.com/en/" title="Java" target="_blank">java</a> container of some sorts.  i choose the friendly open source <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a> as a basis.  i installed <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a> 5.5 using <a href="http://linux-sxs.org/internet_serving/c140.html" title="Tomcat Installation" target="_blank">this helpful guide as a reference</a>.  read a bit more than what is required and make sure you understand the concept of using web applications and the configuration of <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a>.</p>
<p>after <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a> is installed, its time for <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a>.  and as luck would have it <img src='http://blog.dewaldbotha.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  &#8211; a <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> installation in <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a> is probably one of the easiets most straight forward ways of doing it &#8211; and i <a href="http://wiki.apache.org/solr/SolrTomcat" title="Install Solr in Tomcat" target="_blank">just followed this handy guide</a>, provided by <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a>, as a reference.</p>
<p>now after <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> is up and running, you should be able to access your admin screen via the following url <a href="http://localhost:8180/solr/" title="Solr" target="_blank">http://127.0.0.1:8180/solr</a></p>
<p>that would be assuming your <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a> port is 8180 and <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> is installed under the directory <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> in your web applications.</p>
<p>now after installing <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> &#8211; and accessing the admin screen you should &#8211; you should be able view your configuration and perform a basic search.  to update the schema and import data into <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> is a whole other post on its own, and as soon as i figure it out, you will be the first to know &#8211; but for know we are going to focus on replication and scalability.</p>
<p>look at the following diagram (done by me using <a href="http://www.gliffy.com/" title="Gliffy" target="_blank">gliffy</a>)</p>
<p style="text-align: center"><img src="http://farm4.static.flickr.com/3397/3194302830_7100b86480_o.jpg" alt="Solr Replication and Loab Balancing" /></p>
<p>as you can see above &#8211; there is 3 main levels</p>
<ol>
<li>load balancers (wrapped in a monitoring service &#8211; <a href="http://www.keepalived.org/" title="Keepalived" target="_blank">keepalived</a> which is optional)</li>
<li><a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> slave instances</li>
<li><a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> master</li>
</ol>
<p>you can see that it isn&#8217;t the most original idea in the book, but it works.  especially if you should run it within a cloud where your instances could almost be infinite.</p>
<h3>searching with <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a></h3>
<p>lets start from the bottom up &#8211; at the base &#8211; you have the <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> master &#8211; which would be the <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> instance where all your data importing from <a href="http://en.wikipedia.org/wiki/XML" title="XML" target="_blank">xml</a> happens.</p>
<p>above that is the slave instances, which is will replicate their indexes from the master.  the slaves will in turn handle all search request from the load balancers which sits at the top.  this reduces load on the master, which will be used as the primary indexer.</p>
<p>also the load balancers will be wrapped in a monitoring system, which will alert you in case something goes wrong, and also reduced the risk of a single point of entry failure.  this monitoring system is really optional, since the load balancer i will be using has built in failover checks.  but nevertheless it is good practice to look into a monitoring system like <a href="http://www.keepalived.org/" title="Keepalived" target="_blank">keepalived</a> for your architecture.<a href="http://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch" title="HaProxy Multiple load balancers on shared ip" target="_blank"></a></p>
<p>now for the specifics &#8211; to update the index of your master is almost a trivial exercise &#8211; since <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> has a kind of a <a href="http://java.sun.com/developer/technicalArticles/WebServices/restful/" title="Resftul" target="_blank">restful</a> approach of doing this -</p>
<p><code><strong>#the following would be in a .sh file<br />
FILES=$*<br />
URL=http://localhost:8180/solr/update</strong></code><br />
<code><br />
<strong>for f in $FILES; do<br />
echo Posting file $f to $URL<br />
curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'<br />
</strong></code><code><br />
<strong>#send the commit command to make sure all the changes are flushed and visible<br />
curl $URL --data-binary '&lt;commit/&gt;' -H 'Content-type:text/xml; charset=utf-8'</strong></code></p>
<p>so &#8211; if you save the above script as for e.g. post.sh, make it executable (chmod a+x post.sh), then you could run it with a xml dataset (./post.sh dataset.xml), and it will update the index using its semi restful interface.  all you have to do is change your url in the above script to the <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> instance you wish to update.</p>
<p>then comes the replication of data to the slaves &#8211; this is done by using scripts in the bin directory of your <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> installation.  <a href="http://wiki.apache.org/solr/SolrCollectionDistributionScripts" title="Solr Update Scripts" target="_blank">here is the descriptions of these scripts</a> from the <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> websit.  we would most likely be interested in the snapshooter &#8211; which takes a snapshot of the current master index &#8211; from there you would use the snappuller &#8211; which would pull the latest snapshot from the master index and update the index.</p>
<p>after you&#8217;ve mastered these scripts, you can store run them as cron jobs every so often and update the slave indexes.  i&#8217;ve also read on there site, that in the new release they would be compiling replication as a feature into <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a>, where you can actually just tweak a couple of configuration settings, and it should replicate without external tools, so just keep a lookout for the new <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> version.</p>
<h3>load balancing with <a href="http://haproxy.1wt.eu/" title="Haproxy" target="_blank">haproxy</a></h3>
<p>so as a load balancing solution i chose <a href="http://haproxy.1wt.eu/" title="Haproxy" target="_blank">haproxy</a>, which offers high availability and balancing for tcp and http based applications.  <a href="http://haproxy.1wt.eu/" title="Haproxy" target="_blank">haproxy</a> does not deliver content such as <a href="http://www.apache.org/" title="Apache" target="_blank">apache</a>, neither does it do caching in a way <a href="http://www.squid-cache.org/" title="squid" target="_blank">squid</a> does it, but it&#8217;s small, simple and works very well.</p>
<p>to install <a href="http://haproxy.1wt.eu/" title="Haproxy" target="_blank">haproxy</a>:</p>
<p><code><strong>mkdir /opt/haproxy<br />
cd /opt/haproxy<br />
wget http://haproxy.1wt.eu/download/1.3/src/haproxy-1.3.15.7.tar.gz<br />
gunzip haproxy-1.3.15.7.tar.gz<br />
tar -xf haproxy-1.3.15.7.tar<br />
cd haproxy-1.3.15.7<br />
make<br />
cp ./haproxy /usr/bin/haproxy</strong><br />
</code><br />
then create a file called haproxy.cfg that contains the following data:</p>
<p><code><br />
<strong>global<br />
log 127.0.0.1   local0               #logs all haproxy info to local0 log<br />
log 127.0.0.1   local1 notice    #logs all notifications to local1 log</strong></code><br />
<code><br />
<strong>daemon                                     #specifies haproxy to run as a deamon instance<br />
maxconn         4096                 # total max connections (dependent on ulimit)</strong></code><br />
<code><br />
<strong>defaults   #setup some default values<br />
log            global<br />
mode       http<br />
option      httplog<br />
option      dontlognull</strong><br />
</code><code><br />
<strong>clitimeout        60000       # maximum inactivity time on the client side<br />
srvtimeout        30000       # maximum inactivity time on the server side<br />
timeout connect   4000        # maximum time to wait for a connection attempt to a server to succeed</strong><br />
</code><code><br />
<strong>option            httpclose     # disable keepalive (HAProxy does not yet support the HTTP keep-alive mode)<br />
option            abortonclose  # enable early dropping of aborted requests from pending queue<br />
option            httpchk       # enable HTTP protocol to check on servers health<br />
option            forwardfor    # enable insert of X-Forwarded-For headers</strong><br />
</code><code><br />
<strong>balance roundrobin            # each server is used in turns, according to assigned weight</strong><br />
</code><code><br />
<strong>stats enable                  # enable web-stats at /haproxy?stats<br />
stats auth        admin:pass  # force HTTP Auth to view stats<br />
stats refresh     5s        # refresh rate of stats page</strong><br />
</code><code><br />
<strong>listen myloadbalancer 127.0.0.1:8888 #where the loadbalancer should listen for requests<br />
server slave1 192.168.1.6:8180 #a slave</strong></code><br />
<code><strong>server slave2 192.168.1.6:8180 #another slave</strong><br />
</code><br />
&#8211;</p>
<p>and that is it.  now all you do is run:</p>
<p><code><strong>/usr/bin/haproxy -f haproxy.cfg</strong></code></p>
<p>and bob&#8217;s your uncle &#8211; just go to <a href="http://127.0.0.1:8888/haproxy?stats" title="Haproxy Admin" target="_blank">http://127.0.0.1:8888/haproxy?stats</a> to verify that it is running (the username is: admin and the password: pass &#8211; as specified in the config file above)</p>
<p>just note &#8211; if you get binding errors, make sure the port you specify in the config file is open, and not in use by something like <a href="http://www.apache.org/" title="apache" target="_blank">apache</a>, <a href="http://tomcat.apache.org/" title="Apache Tomcat" target="_blank">tomcat</a>, <a href="http://yaws.hyber.org/" title="yaws" target="_blank">yaws</a> etc.</p>
<p>now if you have all your slave instances running a version of <a href="http://lucene.apache.org/solr/" title="Solr" target="_blank">solr</a> &#8211; you should be easily able to connect to them in a round robin fasion &#8211; for e.g. connecting to <a href="http://127.0.0.1:8888/solr" title="Solr" target="_blank">http://127.0.0.1:8888/solr</a> &#8211; you can confirm this in the syslog or with the above stats link.</p>
<h3>monitor that architecture, with <a href="http://www.keepalived.org/" title="Keepalived" target="_blank">keepalived</a></h3>
<p>last, but not the least would be the enabling of the load balancing monitor &#8211; <a href="http://www.keepalived.org/" title="Keepalived" target="_blank">keepalived</a> was suggested to me as a viable option, however, with <a href="http://haproxy.1wt.eu/" title="Haproxy" target="_blank">haproxy</a>  you can have do failover checks and <a href="http://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch" title="HaProxy Multiple load balancers on shared ip" target="_blank">run multiple loadbalancers under a shared ip</a> so this isn&#8217;t really a necessary part, but as explained earlier, it is good practice to run a monitor like <a href="http://www.keepalived.org/" title="Keepalived" target="_blank">keepalived</a> on your system.</p>
<h3>conclusion</h3>
<p>because it is late, and i still have to proofread this sucker, i haven&#8217;t really got around to <a href="http://www.keepalived.org/" title="Keepalived" target="_blank">keepalived</a>, although &#8211; do yourself a favor and read up on it &#8211; i know i certainly will have a cup of java and research it a bit more.</p>
<p>so &#8211; there you go &#8211; you have a fully load balanced system &#8211; running a search engine, which should scale horizontally with your hardware buying budget. <img src='http://blog.dewaldbotha.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.dewaldbotha.co.za/2009/01/14/the-great-search-balancing-act/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
