January 14, 2004 Edition

By Jorge Castro, Peter Bright, Matt Thrailkill


Microsoft Services for Unix

We're back for another edition of Linux.Ars. In this issue we're going to take at Microsoft's Services for Unix, which will soon be available for free, bringing joy to cross-platformers everywhere. Then we're going to take a quick tour of AMD64 support in Linux, and touch on an easy way to keep an eye on that Apache server. We've got tons to cover, so strap in as Linux.Ars breaks new ground and explores the Microsoft side of the house.


Microsoft releases Services for Unix as free beer

Since its inception, one of the features of Windows NT has been its POSIX layer. This is a small set of libraries implementing the core parts of the POSIX specification POSIX being the formal specification describing the APIs, utilities, and shell that a system must provide to be called UNIX. In the olden days, the POSIX layer was a rather sorry affair; the basic APIs were there, but not a lot else; next to noone used them.

To bolster the weak POSIX layer capabilities, Microsoft provided something called Services For UNIX (SFU). This included a host of extra functionality: NFS client and server support, password synchronization, a telnet daemon (which back in the NT 4 days was more useful than it is today, as a telnet daemon is now a standard part of Windows 2000 and 2003), and a ksh-like shell licensed from MKS. Back then, these parts for the most part ignored the POSIX layer too.

But a company called Interix did not ignore the POSIX layer. Instead, they wrote a replacement for it. This replacement was far more full-featured than the default layer; it came with ports of many GNU tools (perhaps most importantly, gcc), and used pd-ksh. Interix was later acquired by Microsoft, and the Interix POSIX layer, shell, and utilities became a part of SFU as of version 3.

Within the next week or two, SFU 3.5 will be released (we have heard variously the 15th and 22nd of January; we will see). SFU 3.5 is of particular interest for a few reasons. The simplest reason is its price; traditionally SFU cost money, about US$99, but as of version 3.5 it will now be gratis.

For those working in mixed environments, the NFS and NIS functionality are perhaps most useful; NFS becomes integrated into Windows, allowing one to browse NFS shares in Network Neighborhood in the way that Windows users expect. NFS in 3.5 appears to be both faster and more reliable than previous versions, and could probably be used as an effective alternative to SMB/CIFS for mixed-environment filesharing.

As for making things more Unixy, SFU 3.5 builds on the work of 3.0. Again, it provides a complete replacement for the NT POSIX subsystem. It offers considerably more APIs than its predecessor of particular interest is that SFU 3.5 supports pthreads and comes with a fairly recent (3.x) version of gcc, so one can now compile a broader selection of software or you can download a small selection of precompiled tools from Interop Systems.

For the NFS client and server facilities, SFU 3.5 is almost certainly the best option for Windows that it significantly undercuts its competitors is reason enough for that. But as a shell, things get a bit more interesting. SFU is not the only way to get a *nix-style shell on Windows. Cygwin and UWIN are two more *nix-style shells for Windows, and there are others that can be used.

SFU, Cygwin, and UWIN all provide a *nix-like environment for Windows. SFU is set apart from the others in that it's a genuine NT subsystem. Cygwin and UWIN are both Win32 libraries, sitting above (and written using) Win32. SFU is not; it sits alongside Win32 rather than on top of it. This allows SFU to avoid issues that the Cygwin developers have had to address regarding, for instance, case sensitivity. Cygwin is just a Win32 program, and works on Windows 9x; as such, it is not case-sensitive (there is or was some way of turning on case-sensitivity, but it is not the default and seems to be little talked about), because on Windows 9x it can't be. NT, on the other hand, is case-sensitive (though the Win32 subsystem that is not ), and as a result, so is SFU. One has to be a little careful with case sensitivity nonetheless, as Win32 programs still do not expect it, but it is there all the same.

Another difference o one likely to be particularly pleasing to Unix diehards is that SFU uses no ".exe" file extensions. Both Cygwin and UWIN perform internal translation so that if one runs, for instance, "ls" then what actually gets run is "ls.exe". In SFU there's no extension ls is called ls. Executability is determined the Unix way; by the permissions on the file. A +x file is executable, anything else is not. There are differences in the fileystem layouts; all the environments emulate a Unix-style single-rooted filesystem; for example, Cygwin mounts its own directory as "/" and uses its /cygdrive mountpoint to access drive letters. SFU uses double slash-preceded letters for drives (for example //C/Windows would be equivalent to C:\Windows). And there are lots of other little changes, like the handling of /proc and other special places (SFU's is vanilla, Cygwin's has some of the extra things that one finds on Linux).

Which environment one prefers is going to depend on which tools and features one wants, and which filesystem layout one prefers. I find myself increasingly using the SFU shell instead of Cygwin, because it feels (in some hard to nail down sense) more "correct." It doesn't have the filename munging, case insensitivity, or file extensions. It integrates better with Windows (for example, its "ps" can tell me more information about more processes), so is more useful for managing Windows things, and it integrates better with other parts of SFU, such as NFS. Cygwin's killer feature is its X Server, but that's all I find myself using it for. It's also increasingly supported as a target for much open source software, which currently Interix is not . I hope that this will change in the future, especially as SFU should become far more widespread now.

All in all, SFU 3.5 represents a pretty compelling package for the Windows user. Making it free is an interesting step from MS. They are doing it to promote migration from Unix to Windows and certainly, the improved POSIX support and GNU toolchain will be useful in that regard but one can't help but feel that in so doing, they're also helping people migrate away from Windows NFS and NIS support mean that it'll play nicer in mixed environments. There is also some talk of a future version being actually bundled with Windows this appears to have been the purpose of the license MS bought from SCO. Whichever way people migrate, SFU 3.5 is a welcome update, and highly recommended to anyone wanting to make their Windows machine more Unixy.


The state of Linux on AMD 64

The Linux kernel was ported to x86-64 (AMD64) before there was any actual silicon to even try and run it on. The port was quick in part due to Linux's open source nature and also because Linux already runs on many other more exotic 64-bit systems. The big vendors like SUSE, Mandrake, and Red Hat had AMD64 releases of their distributions announced before the hardware could be bought by normal consumers as well.

Because most Linux software is open source, moving to a new architecture is relatively painless. In most cases all that is needed is a simple recompile, and most large apps have already been built and cleaned by other people on other 64-bit architectures. This means that 64-bit desktop computing is a real solution now; no waiting for a large vendor to drag its feet on an OS release, no hoping for more than a trickle of applications that will be able to make use of the larger address space, native 64-bit integers, and extra general purpose registers that AMD64 offers.

The i386 architecture has already been revised a few times, most notably during the change from 16-bit to 32-bit. The revision allowed for a 16-bit compatibility mode known as real mode, and added a single new mode known as protected mode. In the change to a 64-bit architecture AMD basically added two more processor modes in addition to the traditional i386 ones. Long mode is the 64-bit mode, it provides 64-bit registers, instructions, and twice as many GPRs. Compatibility mode provides protected mode in a long mode environment - this is what allows the processor to run old i386 applications, but only if they are 32-bit. To go into a little more detail about AMD64 (more detail here and here). No 16-bit DOS stuff, if you want that you'd have to reboot out of a long mode kernel altogether and use another operating system, like Windows 2000/XP, Windows 9x, or DOS.

The 64-bit registers are what allow programs to address more memory. This is because registers (specifically the segment registers) hold memory addresses, among other things. So since you can put larger numbers into registers, you have more possible memory addresses to use. Being able to put 64-bit numbers into the registers is also handy for math. Whereas before an i386 processor may have had to perform tricks to do 64-bit integer math before, it can be done easily with the larger registers and the updated instructions.

The Linux kernel now runs just fine in long mode, and has no trouble letting old binaries run in compatibility mode. One advantage of this is that it is pretty simple to retool an existing i386 Linux install for AMD64. One can simply drop an AMD64 kernel on top of the old one, and it will boot into long mode just fine. The rest of the system will be running in compatibility mode, though. This is fine, because that mode is faster than a traditional Athlon anyways. What would be the point of doing this? Once you had such a setup, you could simply throw together a custom chroot for whatever 64-bit thing you wanted to run; a huge database, some scientific app, or sandboxing some custom Debian install for example. Inside the chroot the 64-bit software will run just as fast as if everything on the system was 100% 64-bit. All the code being executed is 64-bit all the way down to hitting the kernel. Any shared libs the program might need are worked out by ld.so, it will fetch the right 32-bit or 64-bit lib if it can. Obviously this speed and flexibility makes migrations much simpler, which is why a lot of people have been buzzing about the technology and lusting after it.

There is a downside though. To be able to run both types of binaries, you will need libraries for them to link into as well. If you want to be able to run a broad range of applications, this can quickly grow into alot of redundancy. Even though hard drives are pretty large nowadays, it still provides a few headaches as far as management goes. The glibc, gcc, and x86-64.org people decided that 64-bit libs go into /lib64 (even if a system is all 64-bit) and the 32-bit libs go into /lib. These are the places ld.so looks at runtime based on what kind of binary you are trying to run. There are loader programs, linux32 and linux64, which are used to manually tell the machine which type of an executable you are using. They munge the return value of uname and and fix some environment variables and so forth, in case shell scripts might be involved. Packaging systems like Portage and Apt have not yet caught up to the intricacies of /lib and /lib64; they need bi-arch support for things to work right. Right now, the tools do not understand the idea of having the same program installed twice, for two different architectures. In the case of Gentoo, this is not too big a deal because almost everything builds fine from source and you rarely need 32-bit binaries. In the case of Debian, this is a big holdup for proper AMD64 support.

Normally, you'd also need new drivers if you plan to run 64-bit Windows XP you will certainly have some hardware support headaches because all those binary drivers for every obscure garbage-bin hardware device will no longer work. Microsoft and AMD are reportedly doing a fair job persuading vendors to port their drivers to AMD64. Even then, the resulting library of drivers will likely be far smaller. In the case of Linux, most drivers are open and work fine most of the time. You compile your kernel, pick the driver, the hardware works. Even NVIDIA has been releasing AMD64 Linux drivers for their video cards since before AMD publicly released the hardware.

Overall, if you picked up a AMD64 box tomorrow, you'd have the following choices:

1) Run whatever you run now, never touch long mode

2) Run Linux

3) Run a BSD

4) Run Windows

The commercial distributions have good support, which is good for running a big honking server, but, considering AMD64 is still considered an "enterprise platform", the cost might be prohibitive for the hobbyist. By far, Mandrake's US$100 AMD64 version is the most attractive option. The free distributions like Debian, Gentoo, and Fedora may end up moving faster, but as of today support is still developing.

The BSDs are supposed to run well, NetBSD is claimed to be the first OS to be ported to long mode; it was ported completely inside a simulator. AMD64 is supposed to be a Tier 1 supported platform for FreeBSD, although it wouldn't boot on our box. Windows may be a requirement for some, but it is looking like at this point Linux and the BSDs do far better in terms of driver support, and also the Open Source nature of the software makes it much easier to run all 64-bit.

With all the fanfare around the release of the Athlon64 3400+, you may soon see an edition of Linux.Ars that does some profiling of 64-bit vs. 32-bit performance using some chroot magic. Interesting things so far should be Apache, MySQL or PostgreSQL, the usual kernel compile, and perhaps Linpack or something else purely computational. If you have any suggestions, drop them in the discussion box.


Cool App of the Week

Monitoring Apache can be tedious. While using tail -f on your logs is a good way to keep track of what is going on, there is no easy way to give yourself an overview of what your Apache server is doing in real time. Apachetop, similar to the standard top tool, makes this task easier.


Real-time statistics on our Apache server

As you can see we can now monitor what our server is doing at a glance and keep running totals of how much bandwidth Apache is using. By default Apachetop will use your log in /var/log/apache/access.log, but you can add a -f flag to the command and point it to wherever your apache log really is if you moved it.