<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nick@ &#187; Linux</title>
	<atom:link href="http://kavassalis.com/tag/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://kavassalis.com</link>
	<description>code, carriers, cars, cooking, cameras</description>
	<lastBuildDate>Thu, 08 Dec 2011 01:57:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>IRIX porting: GCC extensions, missing functions and more fun</title>
		<link>http://kavassalis.com/2011/04/porting-gcc-extensions-missing-functions-and-more-fun/</link>
		<comments>http://kavassalis.com/2011/04/porting-gcc-extensions-missing-functions-and-more-fun/#comments</comments>
		<pubDate>Thu, 14 Apr 2011 13:47:57 +0000</pubDate>
		<dc:creator>nick</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[gcc]]></category>
		<category><![CDATA[irix]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://kavassalis.com/?p=628</guid>
		<description><![CDATA[I&#8217;ve spun up an IRIX machine at home to replace my long dead Linux based file server / general network management box. (It&#8217;s a 2-node IP45 Origin rack, so 8 sockets of R14000 lovin&#8217;) I don&#8217;t really like to have more than 1 active server and 1 active desktop at home at a time for [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_631" class="wp-caption alignleft" style="width: 470px"><img class="size-full wp-image-631" title="Jurassic-Park-001" src="http://kavassalis.com/wp-content/uploads/2011/04/Jurassic-Park-001.jpg" alt="" width="460" height="276" /><p class="wp-caption-text">It&#39;s a UNIX system, I know this!</p></div>
<p>I&#8217;ve spun up an IRIX machine at home to replace my long dead Linux based file server / general network management box. <a href="http://kavassalis.com/wp-content/uploads/2011/04/1HJ6.jpg" target="_blank">(It&#8217;s a 2-node IP45 Origin rack, so 8 sockets of R14000 lovin&#8217;)</a> I don&#8217;t really like to have more than 1 active server and 1 active desktop at home at a time for simplicity / heat / space reasons. I&#8217;ve been an avid IRIX user since 1990, in the golden days of the Personal IRIS and 4D/480. It was my first UNIX and will always have a special place in my heart. SGI has retired the platform and it will go End of Support in 2013, and the final generation of IRIX hardware is becoming outdated on a GFLOP/watt basis now, and thusly affordable!</p>
<p>&nbsp;</p>
<p>IRIX is POSIX, POSIX2 compliant, really strictly. It was based on SysV and later got several BSD extensions added, but anything that isn&#8217;t mandated with POSIX probably isn&#8217;t there unless they wanted it within the halls of SGI. It&#8217;s profiling support, native debugging tools and general sysadmin usability are still unmatched. I can still install and configure an SGI box in my sleep, I will definitely be sad the day the final IRIX hardware is thoroughly useless performance wise.</p>
<p>Enough of the background though, I didn&#8217;t want to turn this into a sappy rant over a dead platform. I&#8217;ve been porting stuff to IRIX for a decade at least, I find as an exercise it&#8217;s very enjoyable. I&#8217;ve often found myself porting stuff over only to never use it or even nuke it for the sake of the exercise itself. I find it generally makes you a more &#8216;aware&#8217; C (and UNIX) programmer, especially of stuff outside of the realm of Linux and GCC. I ran into a really cool one late last night. I&#8217;ve been working with a bunch of caching engines, on all layers of the equation (object, code, front end), and while I do all of my actual performance testing in Linux land, I&#8217;ve been doing some IRIX ports for shiggles. I ran into in a pretty basic mmap which is what spurred me to write this article:</p>
<blockquote><p>return mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);</p></blockquote>
<p>Pretty basic, MAP_PRIVATE specifies that modifications to our memory map are private, that is our changes won&#8217;t make it to the file descriptor, it&#8217;s copy on write. MAP_ANONYMOUS (at least to me) always seemed like a given too, in Linux/BSD/Solaris it ignores the file descriptors and just gives you a zero&#8217;d memory. OS&#8217; all seem to implement this differently, just reading the OS X man page it looks like you can pass flags to the Mach VM about tags and its purgability. IRIX (and HP-UX it seems) completely lacks it. No biggie! Sure enough we can accomplish the same thing thusly!</p>
<blockquote><p>static int devZero;<br />
devZero = open(&#8220;/dev/zero&#8221;, 2);<br />
return mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, devZero, 0);</p></blockquote>
<p>Easy peasy. Open a file descriptor to /dev/zero, kernel then knows what to do. Worked like a charm, fast in memory caching was a go!</p>
<div id="attachment_630" class="wp-caption aligncenter" style="width: 553px"><img class="size-full wp-image-630" title="Screen-shot-2011-04-13-at-9.33" src="http://kavassalis.com/wp-content/uploads/2011/04/Screen-shot-2011-04-13-at-9.33.jpg" alt="" width="543" height="92" /><p class="wp-caption-text">Ok so this is an OS X portability issue, but IRIX lacks daemon() too. The error was just too good to not include in a portability article :P</p></div>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Let&#8217;s talk about some other Linux/IRIX/UNIX/whateverisms. I&#8217;m just going to skim the surface but here we go&#8230;.</p>
<p style="text-align: left;">You also run into lots of cases where IRIX&#8217;s libc does not support many &#8216;givens&#8217; in the Linux (and even FreeBSD and even sometimes Solaris&#8217;) libc. I was working on porting varnish to IRIX on Sunday morning, and while we&#8217;re still in the syscall tracing phase due to some misbehaving of its internal code compiler, but I&#8217;ll use it as an example as it had two of the most common functions that are easy substitutions I see all the time:</p>
<p><strong>setenv()</strong> setenv was added to UNIX version 7 back in 1979, somehow IRIX completely lacks it. It&#8217;s pretty easy to throw putenv in for most cases though. Looking at the varnish source we swap:</p>
<pre>        AZ(setenv("TZ", "UTC", 1));</pre>
<p>for</p>
<pre>        AZ(putenv("TZ=UTC"));</pre>
<p>Same goes for <strong>wait4()</strong>, it&#8217;s part of System 4 but not specified by POSIX so IRIX completely lacks it. Thankfully it&#8217;s what waitpid() is implemented with on many platforms (BSD), so if the *rusage isn&#8217;t specified, we can do a direct swap thusly:</p>
<pre>        r = wait4(v-&gt;pid, &amp;status, 0, NULL);</pre>
<p>becomes</p>
<pre>        r = waitpid(v-&gt;pid, &amp;status, 0);</pre>
<p>Varnish needed a few other tweaks, stuff like IOV_MAX isn&#8217;t specified in any IRIX headers, but can be gleaned from sysconfig (1024). CLOCK_MONOTONIC is not defined, but thankfully they had Solaris conditionals around it that became Solaris and IRIX conditionals.  Fun.</p>
<p>Onto GCCisms. GCC is a great compiler suite. Supports a ton of software/hardware, but it will never been the fastest on any platform. On SGI&#8217;s we have MIPSpro and on PC&#8217;s you have Intel&#8217;s really awesome compiler suite. (We used to use Intel&#8217;s suite at Cedara for our x86 medical imaging software, strictly based on its killer performance). I tend to try and port stuff to MIPSpro unless you run into too many bad GCCisms. Sometimes I think that GCC being the defacto &#8216;nix compiler, has lead people (likely CS students) to believe thats just &#8220;the way things are&#8221;.</p>
<p><strong>Zero length arrays </strong>i.e.</p>
<pre>char contents[0];</pre>
<p>Ok so it is a clever way to throw a place holder into a struct, but this could be totally handled with a char *contents; pointer too.. Not portable though&#8230;</p>
<p><strong>Variable length arrays</strong> i.e.</p>
<pre>char str[strlen (s1) + strlen (s2) + 1];</pre>
<p>Again, its a clever way to do really basic memory handling without thinking about it, but ugh. C99 does support variable length arrays, but GCC allows them in C89 and C++ code too, just to muck shit up. Don&#8217;t use these outside of C99 code, it will not be portable.</p>
<p>There are so many more, but these are the most common I tend to see. <a href="http://web.mit.edu/gnu/doc/html/gcc_8.html" target="_blank">GNU has a guide on their extensions to the C language</a>, and totally there is lots of really useful stuff there, but just know that it breaks portability. This rant would have a lot more meaning if people used much other than GCC, but nowadays even embedded development and video game consoles are moving toward GCC. At this point in time I honestly don&#8217;t see another compiler suite beating GCC but who knows&#8230;</p>
<p>Code, port, hack. Expect more cooking stuff soon :P</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://kavassalis.com/2011/04/porting-gcc-extensions-missing-functions-and-more-fun/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linux and the maximum number of processes (threads)</title>
		<link>http://kavassalis.com/2011/03/linux-and-the-maximum-number-of-processes-threads/</link>
		<comments>http://kavassalis.com/2011/03/linux-and-the-maximum-number-of-processes-threads/#comments</comments>
		<pubDate>Thu, 03 Mar 2011 22:10:57 +0000</pubDate>
		<dc:creator>nick</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[limits]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://kavassalis.com/?p=437</guid>
		<description><![CDATA[So last night we  were debugging some odd Apache behaviour when we discovered that RHEL6 has modified the default number of processes (which includes threads in linux, but I&#8217;ll refer to it as procs for the purpose of this article). In RHEL5 (and prior), each user is given the default fork init number of threads [...]]]></description>
			<content:encoded><![CDATA[<p>So last night we  were debugging some odd Apache behaviour when we discovered that RHEL6 has modified the default number of processes (which includes threads in linux, but I&#8217;ll refer to it as procs for the purpose of this article). In RHEL5 (and prior), each user is given the default fork init number of threads (which we&#8217;ll discuss shortly). It&#8217;s a huge number usually, 106, 496 threads on my RHEL5 dev box here. RHEL6 on the other hand, drops a file into /etc/security/limits.d that limits all users to 1024 procs.</p>
<p style="text-align: center;"><a href="http://mlkshk.com/r/RFS" target="_blank"><img class="aligncenter" src="http://mlkshk.com/r/RFS" alt="" width="554" height="79" /></a></p>
<p>Now, /etc/security/limits.conf and /etc/security/limits.d are read by PAM&#8217;s pam_limits.so, so only things that use PAM will ever touch these. Apache starting on a server reboot is not affected by these limits, so Apache will have access to its full 106,496 threads. But if root restarts apache from a shell (/etc/init.d/httpd restart or service httpd restart), Apache will inherit root&#8217;s proc limit of 1024! This can be verified with a &lt;?php passthru(&#8216;ulimit -u&#8217;); ?&gt; via mod_php.</p>
<p><a href="http://mlkshk.com/p/RFT"  target="_blank"><img class="alignleft" src="http://mlkshk.com/r/RFT" alt="" width="373" height="76" /></a>1024 threads is totally not enough for reasonable operation in Apache. It will begin to kill children off as they seteid() to its designated user as the call fails with errno 11. (seteid() has not always performed an nproc check, this was added in some 2.6 kernel to prevent people from sneaking over the limit) You can work around this by dropping a <em>ulimit -u &lt;much higher number&gt;</em> into /etc/sysconfig/httpd, but my preferred solution is to remove (blank) RHEL6&#8242;s new /etc/security/limits.d/90-nproc.conf . (Or I guess you could just never restart your apache, best to reboot the system if theres a problem :P) Thanks Redhat.</p>
<p>Back to that large 106,496 number. Thats the maximum number of procs that nick as a user can spawn, even using ulimit -u he can&#8217;t go higher than that. The system&#8217;s maximum number of threads (visible in /proc/sys/kernel/threads-max) is 212992 (exactly double) is decided at boot time by the kernel. I decided to do some digging in the kernel source and the maximum number of threads is calculated thusly:</p>
<p style="padding-left: 30px;">max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);</p>
<p style="padding-left: 30px;"><em>(defined in kernel/fork.c, called by init/main.c)</em></p>
<p>PAGE_SIZE is architecture specific, for x86 it&#8217;s 4kb (and 4MB but thats a discussion for another day), and is calculated via:</p>
<p style="padding-left: 30px;">#define __AC(X,Y)       (X##Y)<br />
#define _AC(X,Y)        __AC(X,Y)<br />
#define PAGE_SHIFT      12<br />
#define PAGE_SIZE       (_AC(1,UL) &lt;&lt; PAGE_SHIFT)</p>
<p>PAGE_SHIFT is a constant for x86, other architectures support alternate (larger) page sizes. THREAD_SIZE is also architecture specific:</p>
<p style="padding-left: 30px;">#define THREAD_ORDER    1<br />
#define THREAD_SIZE  (PAGE_SIZE &lt;&lt; THREAD_ORDER)</p>
<p>THREAD_ORDER is 1 in x86, but can vary depending on the arch. These result in a PAGE_SIZE of 4096 and a THREAD_SIZE of 8192 for x86. Back to the formula  above, we can fill it in as:</p>
<p>&nbsp;</p>
<p style="padding-left: 30px;">max_threads = totalram_pages / (8 * 8192 / 4096);</p>
<p style="padding-left: 30px;">becomes</p>
<p style="padding-left: 30px;">max_threads = totalram_pages / 16 ;</p>
<p>Calculating totalram_pages is a bit tricky since its not just the full system ram allocated to pages. The easiest way to calculate it from /proc/zoneinfo, by summing the spanned pages:</p>
<p style="padding-left: 30px;">cat /proc/zoneinfo | grep spanned | awk &#8216;{totalpages=totalpages+$2} END {print totalpages}&#8217;<br />
3407872</p>
<p>So on my server with 12 gigabytes of memory we&#8217;ve got:</p>
<p style="padding-left: 30px;">max_threads =  3407872 / 16;</p>
<p style="padding-left: 30px;">becomes</p>
<p style="padding-left: 30px;">max_threads = 212992;</p>
<p>Now the default maximum number of threads is designed to be able to only consume half the memory from threads alone. But one user should also not be able to consume all of the threads on the system. Back in kernel/fork.c we have:</p>
<p style="padding-left: 30px;">init_task.signal-&gt;rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;<br />
init_task.signal-&gt;rlim[RLIMIT_NPROC].rlim_max = max_threads/2;</p>
<p><a href="http://mlkshk.com/p/RFU"  target="_blank"><br />
<img class="alignright" src="http://mlkshk.com/r/RFU" alt="" width="373" height="107" /></a></p>
<p>So there you have it, the ulimit -u for nick (or root) is now set to 106496 procs (threads). nick is unable to take up all of the processes on a box, even using ulimit, he cannot raise past max_threads/2 (106496, though root can). Unless modified via /proc/sys/kernel/threads-max, a maxed out system will not use more than 50% of the memory to store thread structures. Hopefully you too now know a little bit more about linux process limits! Big thanks to colleague and friend Jim Hull for his troubleshooting and major help last night.</p>
<p><em>The exact formulas may vary kernel to kernel, my findings were through the kernel 2.6.37 source and verified on RHEL5&#8242;s heavily modified 2.6.18-194 kernel</em></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://kavassalis.com/2011/03/linux-and-the-maximum-number-of-processes-threads/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Of Nick and hosting geo-diversity&#8230;</title>
		<link>http://kavassalis.com/2010/02/of-nick-and-hosting-geo-diversity/</link>
		<comments>http://kavassalis.com/2010/02/of-nick-and-hosting-geo-diversity/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 15:55:27 +0000</pubDate>
		<dc:creator>nick</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[hosting]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[telecom]]></category>

		<guid isPermaLink="false">http://kavassalis.org/?p=29</guid>
		<description><![CDATA[If you look at the biggest websites and internet applications, you can pretty much divide them into two groups. Those that are geographically diverse and those that aren&#8217;t. It&#8217;s kinda shocking that in 2010, the majority of major internet properties still are located in a single (large) datacenter. Though to be fair there is a [...]]]></description>
			<content:encoded><![CDATA[<p>If you look at the biggest websites and internet applications, you can pretty much divide them into two groups. Those that are geographically diverse and those that aren&#8217;t. It&#8217;s kinda shocking that in 2010, the majority of major internet properties still are located in a single (large) datacenter. Though to be fair there is a good reason for that, geo-diversity has many challenges. Problems like directing traffic to the fastest/closest/cheapest/most available location are pretty easy to solve: most people go with BGP AnyCast, targeted DNS responses, or a combination of both. The real challenge though is making sure your actual served content is coherent among all the locations. It would be terrible for a user to upload a photo, sent the URL to their friends, only for the friends to see nothing or worse, the wrong image.</p>
<p>For static content, this is easy, even RSYNCs will be scalable to push out changes to your content amongst your farm. User uploaded content is quite a bit trickier. Within a single datacenter you can efficiently (though not always affordably) solve this using shared storage, iSCSI or NFS. Then applications pretty much can work as if they&#8217;re on a single server, session management can be tackled by using cookie or host persistence on the load balancers to make sure a user stays on the same server. What about servers in different locations though? NFS and iSCSI will not be terribly effective over transit.</p>
<p>You will have to push content between your locations then. If you are trying to geographically distribute your own application, you would just write functionality in to immediately push any user uploaded content out to other locations as its created.  Google/Youtube are great examples of this. When you hit content they&#8217;ve hosted, it isn&#8217;t even hosted on every server, and they direct you to the closest server that has said content. If that content isn&#8217;t available locally to you yet, or at all, they can stream it over their own fiber backhaul and out your closest Google POP.</p>
<p>But what if you are hosting a variety of 3rd party software. To my knowledge none of the popular blog packages, forum software, etc has any sort of geo-diversity designed into them. You could of course fork them and write your own, but then you end up supporting N different software packages for your N clients, not affordable or reasonable.  Rsync would do this task but unfortunately it is very intensive and doesn&#8217;t scale particularly well because it md5&#8242;s your entire tree constantly to see if things changed. As your content scales, the rsyncs would get slower and slower just seeing if changes occurred, eventually leading to massive delays on syncing out user created content.</p>
<p>In the end, its a cool problem, a problem that not too many people have tackled so far. We came up with our own solution, which I unfortunately probably shouldn&#8217;t disclose. I wrote the basis of the software last month, though it still needs some bug fixes, testing and more modules to be written for it. It is a difficult problem to tackle, but having worked in telecommunications, no facility is bullet proof, no power is bullet proof, no connectivity is bullet proof, no hardware is bullet proof: geo-diversity is a must going forward in this highly demanding world where everyone expects connectivity and content 24/7</p>
]]></content:encoded>
			<wfw:commentRss>http://kavassalis.com/2010/02/of-nick-and-hosting-geo-diversity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

