Welcome
COMMENT
Changing Times We pride ourselves on the origins of our publication, which come from the early days of the Linux revolution.
Dear Linux Magazine Reader,
Our sister publication in Germany, As Autumn is now upon us, the IT world slowly shakes off its summer founded in 1994, was the first lethargy. New projects are starting to appear and some are designed for our Linux magazine in Europe. Since business needs. then, our network and expertise has grown and expanded with the Kroupware should be ready by the end of the year. This is a large open Linux community around the source groupware product that has been commissioned by one of the German world. governmental IT agencies. The project aims to produce a server with all the As a reader of Linux Magazine, functionality of MS Exchange servers and MS Outlook 2000 clients. you are joining an information network that is dedicated to Why is this a good thing? Well, it is good for the IT agency as they will get distributing knowledge and an open source alternative based on standard components. If these sources technical expertise.We’re not are then released we will all gain for what would be seen a small expenditure simply reporting on the Linux and Open Source movement, for one large IT department. Other governments could then use this project we’re part of it. and gain all the benefits for no cost. Does this harm anyone? Other countries benefiting does not harm the original commissioning agency so everyone benefits all around. Governments helping one another – this could be a good thing. As much as I distrust governments this might be something I applaud. Now if only governments had video game departments… A new political platform has been created by Bruce Perens after his departure from Hewlett Packard. The Sincere Choice initiative, is in response to the Microsoft-backed Software Choice Initiative. The difference is one aims at giving people the choice of being able to use proprietary software, while the other aims for the choice of proprietary software OR Free software. The main principle of the Sincere Choice is for a fair, competitive market for computer software, both proprietary and Open Source. Bruce will have no complaints here. Software should live or die based on its merits, not on whether you are locked in to a proprietary monopoly. Microsoft seem determined to stop any form of competition. First we had C# and .NET then reading Survey Results DVD’s without proprietary decoders. Next was Software patents and just as we are beginning to make politicians Here are some of the results from the Linux Magazine reader survey: understand how wrong this is along comes the Palladium project. This is where a new security chip will 98% of our readers are male be built onto all future motherboards. Access to, say a 62% use Linux at work secure website, will be allowed only if you are using a 43% of our readers hold Technical registered proprietary software. Specialist positions This Trusted Computing Platform will stop any evil 27% share their copy with others terrorist from buying the latest Amazon blockbuster but 24% buy just for the Community pages what about us free world people? Content you receive 76% use Linux on the desktop may be limited because you are not trusted or it is deemed bad for you, such 9% are still living with their parents :-) as the opposition’s political information. Will this be the way Subscription Winners MS finally gets to control the The winners from the survey draw are: Internet?
• Mr M. C. Bishop, Chirnside, UK • Mr L. J. Kroker, Hong Kong
Happy Hacking,
• Mr A. Mc Donald, Dublin, Ireland • Ms L. Staal, Alkmaar, The Netherlands • Mr M.Thompson, Cambus, UK Enjoy your free subscription!
John Southern Editor
www.linux-magazine.com
October 2002
3
NEWS
Software
Software News Mozilla leads the way The Optimoz project is pushing the boundaries of mouse control again. This time with “mouse gestures”. With this add-on you can use the mouse to control the rapid execution of common browser commands with mouse movements. An example is holding down the mouse button and then moving the mouse in a certain way to add the page as a bookmark. An example of this can now be downloaded from the optimoz.mozdev. org website.
The next step is piemenus. These can be thought of as mouse gestures with feedback. Here sectors of a circle act as different menu choice areas. A common use in within ‘The Sims’ game. This allows context sensitive menus. ■ http://optimoz.mozdev.org
be performed while the database is online and applications are available to the end-user. The Oracle RMAN APM supports full, differential, incremental and consolidated backups with an easy to use Graphical User Interface. The Oracle9i RMAN APM costs $995. ■ http://www.bakbone.com
Flight Gear Flight Gear the Linux flight simulator has released version 0.8.0. The new version requires SimGear 0.2.0. This follows on from the release of a new online tutorial called “A Circuit in FlightGear” which is available at www. flightgear.org. The FlightGear flight simulator project is an open-source, multi-platform, co-operative flight simulator development project. Source code for the entire project is available and licensed under the GNU General Public License. The goal of the project is to create a sophisticated flight simulator framework for use in research or academic environments, for the development and pursuit of other interesting flight simulation ideas, and as an end-user application. ■ http://www.flightgear.org
Bakbone lend support to UnitedLinux BakBone Software, who provide storage management software, announced support for UnitedLinux. It aims to give a wide range of Linux support with six platforms and forty application plugin modules for its NetVault product. The secure backup and recovery software is used by ISVs and IHVs to give a flexible solution with modular architecture. It has added an Oracle 9i module to give support for customers using Oracle RMAN architecture. This allows for Hot backup: All backups can
8
October 2002
KDE Ships KOffice 1.2 The KDE Project announced the availability of KOffice 1.2. David Faure, Koffice release co-ordinator and developer, noted that the release features “an incredible number of improvements”.
www.linux-magazine.com
With a great new thesaurus, enhanced scriptability of suite components, WYSIWYG on-screen display, bi-di text, KWord mail-merge and footnotes, and KSpread database connectivity, enhanced printing and new sorting functionality, who’s to argue? And let’s not overlook the constant improvements in the filters, though the HTML import took a step back to take full advantage of KHTML’s powerful HTML parsing in the next release. Karbon14, the extremely promising vector-graphics program (with SVG support!), is not officially in this release but many of the packagers have packaged it as well. ■ http://www.koffice.org
Quanta Plus 3.0 PR2 Released The Quanta Plus development team is currently churning out more code than at any time in its history. So to keep you in the loop, the Quanta Plus site has been
revamped. They have put up some new screenshots and implemented new site features such as a publicly accessible developer todo list. So what’s new with Quanta? 3.0 PR2 has been released. You’ll find auto-completion for HTML and tag attributes, PHP built-in function auto-completion, a revised document structure tree recurses PHP structures and embedded HTML, and more. One exciting bit of work in progress is the ability to set different DTDs as well as offer tagging functionality in the form of pseudo DTDs to script languages. There are other fixes and enhancements and more on the way. ■ http://quanta.sourceforge.net
Software
IPv6 Essentials IPv6, the next generation Internet Protocol, has been in the works since the early 90s when the rapid growth of the Internet threatened to exhaust existing IP addresses. Drawing on 20 years’ operational experience with the existing protocol (IPv4), the new protocol offers scalability, increased security features, real-time traffic support, and autoconfiguration so that even a novice user can connect a machine to the Internet. What does this mean for IT workers? Having learned all the strengths and weaknesses of the old protocol, do they abandon that knowledge and start afresh with the new? “IPv6 Essentials” provides a wellorganized introduction to IPv6 for experienced network professionals, administrators, managers, and the executives. Hagen covers all the new features and functions in Ipv6, discussing everything readers will need to understand to get started, including how to configure IPv6 on hosts and routers, and which applications currently support Ipv6. In Asia, many ISPs have to use IPv6 today because they do not have enough address space to cover the demand. In Europe and the US, the situation with address space is not as critical yet, but it will become so. IPv6 has many features that will be needed for current and emerging technologies like Mobile IP. ■ http://www.oreilly.com
NEWS
Browse the code Hyperscr has been updated to version 5.3.1. Hyperscr is a GPL released GUI program for browsing source code. It displays a list of widgets containing sorted source code tags. The new version takes advantage of regular expression searching and the ability to browse C++ code easier. ■ http://www.jimbrooks.org/web/U hypersrc/hypersrc.html
Sodipodi 0.25 released Sodipodi is a vector based drawing program, similar to CorelDraw or Adobe Illustrator from the proprietary software world, and Sketch or KIllustrator from the free software world. A vertical writing mode is now supported. It is free software, distributed under the terms of Gnu General Public License, Version 2. Sodipodi uses the W3C SVG as its native file format, being thus a very useful tool for web designers. It has a modern display engine, allowing fine antialiased display, alpha transparencies, vector fonts and so on. Sodipodi is written in C, using Gtk+ toolkit and some Gnome libraries. Both source and rpm versions have been released. ■ http://sodipodi.sourceforge.net
Acronis supports Linux journaling filesystems Acronis is known for its hard disk management, and multi-boot products. The company has added Ext3 and ReiserFS support to its new range of tools. The utilities include a disk imaging package called TrueImage. This is used to create exact copies of hard disks or partitions for either backup storage or upgrading hardware. For upgrading hard drives quickly the MigrateEasy utility can be used. The final product is the OS Selector. This is a multi-boot utility that allows PC users to run multiple operating systems on a single computer, including all versions of Linux. The software also repartitions and reorganizes hard disks for seamless installation of new operating systems. It’s the solution for PC users who want a safety net when trying Linux for the first time. ■ http://www.acronis.com
GNOME Accessibility credited The “GNOME Accessibility Architecture” has been singled out in this year’s “Helen Keller Achievement Award in Technology”, one of the annual “Helen Keller Awards” presented by the American Foundation for the Blind. These coveted awards recognize the “notable accomplishments of individuals who are role models or improve the quality of life for people who are blind or visually impaired.” According to a pre-press release, the GNOME Accessibility architecture demonstrates a commitment to accessible technology which “raises the bar for the computing industry” and “dramatically expands the options available” for technology consumers who are blind or visually impaired. ■ http://www.afb.org
Mozilla 1.0.1 Released Mozilla 1.0.1 is now available for download. This release is a bug fix and security follow-up to Mozilla 1.0. Mozilla 1.0 users that have not already upgraded to Mozilla 1.1 are strongly encouraged to upgrade to 1.0.1. ■ http://www.mozilla.org
www.linux-magazine.com
October 2002
9
NEWS
Business
Business News OMV with Linux on a Success Course OMV, the largest Austrian oil and gas corporation, is consolidating 27 SAP R/3 systems implemented on various platforms onto a single SAP landscape, utilizing mySAP on Linux IBM eServer zSeries. First results from the new IBM solution reveal significant performance enhancements for SAP applications, as well as, high availability and stability of all systems. “We’re absolutely convinced of the quality, availability and scalability of mySAP together with Linux on eServer zSeries. By contrast with other platforms, this solution offers us an opportunity to effectively manage our extensive and heterogeneous IT landscape”, says Walter Rotter, IT Manager at OMV. As an internationally active oil and natural gas corporation with a workforce of around 5,700 people, OMV has to rely on a secure and reliable SAP platform to be able to exploit its vast amounts
of SAP data. A stable, single central SAP instance, with unprecedented capabilities to manages large volumes of data at highest availability levels, are the advantages IBM is able to offer its customers with this solution. Installation and a first test run were started within five days. “The IBM zSeries solution is professional, efficient and a cost effective architecture that enables us to expand the system with ease any time we want. For example, when projects come up at short notice, or start off slowly, or exist only temporarily, the workload can be accommodated without the necessity of adding new hardware or buying more servers, which was the case in the past with SAP upgrades. Instead, with zSeries’ unique resource sharing, we eliminated the need for the costly hardware we kept in reserve for peek demand”, says Walter Rotter. ■ http://www.omv.com
Linux Cluster for Engine Development MTU Aero Engines has decided to use a new Linux cluster for its large scale engine development. The Linux cluster consists of a total of 64 Dell PowerEdge 1550 standard rack servers with two 1.13 GHz Intel Pentium III processors each, and a total main memory of 144 GB SDRAM for the implementation of additional computing capacities when and as required. SuSE Linux Enterprise Server 7 is used as the operating system for the cluster. The computers are networked using Fast Ethernet and 2 GBit/s Myrinet. The decision for the Linux cluster results from the MTU Aero Engines engineers’ rapidly rising demands for computing power. At the same time, they expect the price-performance ratio to be as modest as possible. Nowadays, simulation programs requiring a high level of computing are indispensable for the development of
10
October 2002
competitive products. Already at the design stage, complex algorithms, the aerodynamic, thermodynamic, and mechanical properties of components are predicted, enabling an early optimization of the geometry of the manufactured components. As is the case with most modern mainframe systems, Linux clusters that strictly comply with standard architectures can be expanded virtually without limit should the current capacity prove inadequate. Server systems which are built with the standard components (CPU, bus architecture, operating system etc.,) guarantee the lowering of system costs, better cross node compatibility, as well as comparable availability, resulting in a lower total cost of ownership. MTU Aero Engines intends to expand the Linux cluster soon. ■ http://www.mtu.de/mtu
www.linux-magazine.com
Caldera no more Caldera International is changing its name to The SCO Group and changing its logo back to the stylized tree. Caldera bought SCO Unix in 2000 and after a somewhat drawn out process finally changed the corporate branding to Caldera. Now the board has decided to change the name back to SCO in the hope of gaining some leverage from the name. The SCO (Santa Cruz Operation) was well know amongst UNIX vendors and customers and the new renaming aims to focus on that goodwill. The products Caldera OpenServer and Unixware will now be known as SCO OpenServer and SCO Unixware along with the new SCO UnitedLinux which will be released at the end of the year in co-operation with SuSE, Connectiva and TurboLinux. TurboLinux itself has recently been bought by the Japanese Software Research Associates Inc, and aims to focus on the Asia market. The Caldera Partner Programme will be rebranded as TeamSCO. ■ http://www.caldera.com/
Simplified support Vcentrix has decided to use Simplified’s Open voice and data services to provide its customers with a flexible Voice over IP (VoIP) billing solution. The combined Linux version of the Simplified Ventana suite of applications gives a range of tools for rating and routing configuration. It can also handle reporting and customer relations management with a range of easy to use web based tool. As Vcentrix focuses on Internet Service Providers and VoIP based carriers for its clients, the open source solution gives greater visibility to their communications usage and costs. The Linux solution enables the HiveNet billing service to be scaleable as the businesses develop and expand in size. Currently the system supports all of Cisco systems and other hardware support is planned. ■ http://www.simplified.com
Business
Support for Intel IOP321 I/O MontaVista Software Inc., announced the MontaVista Linux Professional Edition 2.1 development environment for the Intel IOP321 I/O processor. The IOP321 processor, is based on Intel XScale technology. The XScale microarchitecture is a powerful processor technology targeted for applications in Internet-connected appliances, as well as for network infrastructure devices and storage solutions. The new software includes cross development tools and native tool chains for local development, along
CA launches Linux solutions with extensive real-time features. It now targets 24 processors and supports more than 70 commercial off-the-shelf and reference boards. Additional support for the Intel IXC1100 control plane processor has also be included, which is used for deep packet inspection on line cards and blades. ■ http://www.mvista.com
Sendmail, Inc. and HP demonstrate demand for Linux Sendmail, Inc, and HP announced a significant sales momentum for their combined Linux-based email solution running on HP ProLiant servers, further demonstrating the increase in the adoption of Linux for commercial applications in the enterprise. Since the beginning of the companies’ sales and marketing relationship in January 2002, Sendmail, Inc. and HP have signed several deals yielding sales in excess of $5 million, spanning three continents and several industries, including ISPs, financial services, universities, healthcare and government agencies. The relationship between Sendmail, Inc. and the former Compaq began with the packaging of Sendmail’s Mailstream Manager and Integrated Mail Suite with Compaq’s ProLiant server and SuSE
NEWS
Linux enterprise server. Sendmail, Inc. signed a similar agreement with HP in February 2002. Sendmail’s relationship with the new HP company continues its focus on the development and marketing of tested and proven Linux-based enterprise solutions for email applications. New customers of the Sendmail and HP solution include a wireless GSM-cell phone provider in Jordan – which has already launched a new service with the HP and Sendmail solution; the American University in Cairo; a Romanian service provider; BAX Global Logistics; HealthNET – a major healthcare provider; Shanghai National Bank; a large Chicago-based law firm; and a US government agency. ■ http://www.sendmail.com
Computer Associates International (CA) has launched a number of new solutions for mainframe and distributed Linux environments. For mainframe Linux environments, CA has released new versions of Unicenter Software Delivery and Unicenter Asset Management, as well as beta releases of Unicenter ServicePlus Service Desk and BrightStor Enterprise Backup. For distributed Linux environments, CA has released a new version of eTrust Admin and several BrightStor options and agents. This brings CAs total Linux offering to more than 50 solutions. CA has extended its Linux management tools with new solutions for distributed and mainframe deployments – customers can turn to one of the broadest management portfolios for the integration of resources. “The ability to run a multitude of wellmanaged virtual servers on a mainframe makes zSeries the platform of choice for Linux at our company,” said Bill Dickson, systems manager at R.J. Reynolds Tobacco Company. “CA’s integrated systems management and security solutions for VM and Linux provide exactly what we need to protect our mainframe Linux investments. The CA solutions also integrate easily into our existing management infrastructure and enable our staff to fully leverage their current skill sets.” CA has worked closely with the Linux community to ensure full compatibility of its solutions with all major distributions. In addition, CA is working with the new UnitedLinux initiative to ensure compliance within its new releases. ■ http://ca.com
SourceForge to run on IBM DB2 Open Source Development Network, Inc., (OSDN) a subsidiary of VA Software, will port SourceForge.net – the largest open source development site on the Web – to run exclusively on IBM’s DB2 database software and WebSphere Internet infrastructure software and tools for Linux. SourceForge.net has more than 45,000 open source projects and over 460,000
online users. After an evaluation of open source and commercial databases, including Postgres, MySQL, Oracle, and DB2, OSDN selected DB2 to accommodate its planned future growth of SourceForge.net. The SourceForge.net site has seen outstanding growth since its inception with 220% growth in traffic in the last year and thousands of users working on
projects ranging from Python to Freenet. VA Software and IBM have also entered into a commercial relationship to jointly market and sell the next generation of SourceForge Enterprise Edition with full support for IBM DB2 database software, WebSphere Internet infrastructure and Tivoli management software, as well as eServer xSeries systems. ■ http://www.osdn.com
www.linux-magazine.com
October 2002
11
NEWS
Insecurity
Insecurity News Apache/mod_ssl Worm The Apache/mod_ssl worm scans for vulnerable systems on 80/tcp using an invalid HTTP GET request. GET /mod_ssl:error:HTTP-request HTTP/1.0. When an Apache system is detected, it attempts to send exploit code to the SSL service via 443/tcp. If successful, a copy of the malicious source code is then placed on the victim server, where the attacking system tries to compile and if sucessful, run it. Once infected, the victim server begins scanning for additional hosts to continue the worm’s propagation. The worm can act as an attack platform for distributed denial-of-service (DDoS) attacks against other sites by building a network of infected hosts. During the infection process, the attacking host instructs the newlyinfected victim to initiate traffic on 2002/udp back to the attacker. Once this communications channel has been established, the infected system becomes
part of the Apache/mod_ssl worm’s DDoS network. Hosts can then share information on other infected systems as well as attack instructions. Thus, the 2002/udp traffic can be used by a remote attacker as a communications channel between infected systems to co-ordinate attacks on other sites. ■ CERT Advisory CA-2002-27
kdelibs A vulnerability was discovered in KDE’s SSL implementation in that it does not check the basic constraints on a certificate and as a result may accept certificates as valid that were signed by an issuer who is not authorized to do so. It can lead to Konqueror and other SSL-enabled KDE software falling victim to a man-in-the-middle attack without being aware of the invalid certificate. This will trick users into thinking they are on a secure connection with a valid
Table 1: Security Posture of Major Distributions Distributor
Security Sources
Comment
Debian
Info:www.debian.org/security/, List:debian-security-announce, Reference:DSA-… 1)
Debian have integrated current security advisories on their web site.The advisories take the form of HTML pages with links to patches.The security page also contains a note on the mailing list.
Mandrake
Info:www.mandrakesecure.net, List:security-announce, Reference:MDKSA-… 1)
MandrakeSoft run a web site dedicated to security topics. Amongst other things the site contains security advisories and references to mailing lists.The advisories are HTML pages,but there are no links to the patches.
Red Hat
Info:www.redhat.com/errata/ List:www.redhat.com/mailing-lists/ (linux-security and redhat-announce-list) Reference:RHSA-… 1)
Red Hat categorizes security advisories as Errata:Under the Errata headline any and all issues for individual Red Hat Linux versions are grouped and discussed.The security advisories take the form of HTML pages with links to patches.
SCO
Info:www.sco.com/support/security/, List:www.sco.com/support/forums/U announce.html, Reference:CSSA-… 1)
You can access the SCO security page via the support area.The advisories are provided in clear text format.
Slackware
List:www.slackware.com/lists/ (slackware-security), Reference:slackware-security …1)
Slackware do not have their own security page, but do offer an archive of the Security mailing List.
SuSE
Info:www.suse.de/uk/private/support/U security/, Patches:www.suse.de/uk/private/U download/updates/, List:suse-security-announce, Reference:suse-security-announce … 1)
There is a link to the security page on the homepage. The security page contains information on the mailing list and advisories in text format. Security patches for individual SuSE Linux versions are marked red on the general update page and comprise a short description of the patched vulnerability.
1) Security mails are available from all the above-mentioned distributions via the reference provided.
12
October 2002
www.linux-magazine.com
site when in fact the site is different from that which they intended to connect to. The bug is fixed in KDE 3.0.3, and the KDE team provided a patch for KDE 2.2.2. ■ Mandrake reference MDKSA-2002:058
krb5 The network authentication system in Kerberos 5 contains an RPC library that includes an XDR decoder derived from Sun’s RPC implementation. This implementation is vulnerable to a heap overflow. With Kerberos, it is believed that an attacker would need to be able to successfully authenticate to kadmin to be able to exploit this vulnerability. ■ Mandrake reference MDKSA-2002:057
Updated gaim client fixes URL vulnerability Updated gaim packages are now available for Red Hat Linux 7.1, 7.2, and 7.3. These updates fix a vulnerability in the URL handler. Gaim is an all-in-one instant messaging client that lets you use a number of messaging protocols such as AIM, ICQ, and Yahoo, all at once. Versions of gaim prior to 0.59.1 contain a bug in the URL handler of the manual browser option. A link can be carefully crafted to contain an arbitrary shell script which will be executed if the user clicks on the link. Users of gaim should update to the errata packages containing gaim 0.59.1 which is not vulnerable to this security issue. ■ Red Hat reference RHSA-2002:189-08
New wordtrans packages fix remote vulnerabilities The wordtrans-web package provides an interface to query multilingual dictionaries via a web browser. Guardent Inc. has discovered vulnerabilities which affect versions up to and including 1.1pre8. Improper input validation allows for the execution of arbitrary code or injection of cross-site scripting code by passing in unexpected parameters to the wordtrans.php script. The wordtrans.php script then unsafely executes the wordtrans binary with the malformed parameters. All users of wordtrans are advised to upgrade to the errata packages
Insecurity
which contain a patch to correct this problematic vulnerability. ■ Red Hat reference RHSA-2002:188-08
Updated ethereal packages are available Ethereal is a package designed for monitoring network traffic on your system. A buffer overflow in Ethereal 0.9.5 and earlier allows remote attackers to cause a denial of service or execute arbitrary code via the ISIS dissector. Users of Ethereal should update to the errata packages containing Ethereal version 0.9.6 . ■ Red Hat reference RHSA-2002:170-12
i4l The i4l package contains several programs for ISDN maintenance and connectivity on Linux. The ipppd program which is part of the package contained various buffer overflows and format string bugs. Since ipppd is installed setuid to root and
executable by users of group ‘dialout’ this may allow attackers with appropriate group membership to execute arbitrary commands as root. The i4l package is installed by default and also vulnerable if you do not have a ISDN setup. The buffer overflows and format string bugs have been fixed. We strongly recommend an update of the i4l package. If you do not consider updating the package it is also possible to remove the setuid bit from /usr/sbin/ipppd as a temporary workaround. The SuSE Security Team is aware of a published exploit for ipppd that gives a local attacker root privileges so you should either update the package or remove the setuid bit from ipppd. ■ SuSE reference SuSE-SA:2002:030
glibc An integer overflow has been discovered in the xdr_array() function, contained in the Sun Microsystems RPC/XDR library, which is part of the glibc library package
NEWS
on all SuSE products. This overflow allows a remote attacker to overflow a buffer, leading to remote execution of arbitrary code supplied by the attacker. There is no temporary workaround for this security problem other than disabling all RPC based server and client programs. The permanent solution is to update the glibc packages. ■ SuSE reference SuSE-SA:2002:031
cacti A problem in cacti, a PHP based front-end to rrdtool for monitoring systems and services, has been discovered. This could lead into cacti executing arbitrary program code under the user id of the web server. The problem has been fixed by removing any dollar signs and backticks from the title string in version 0.6.7-2.1 for the current stable distribution (woody) and in version 0.6.8a-2 for the unstable distribution (sid). ■ Debian reference DSA-164-1
NOT ROCKET SCIENCE INTEL 1U RACKMOUNT LINUX SERVER DNUK
DELL
Teramac R110 1U rackmount server Intel Pentium III 1.20GHz 512MB RAM 80GB 7,200RPM ATA disk drive Red Hat 7.3 pre-installed 3 years on-site warranty
PowerEdge 350 1U rackmount server Intel Pentium III 1.0GHz 512MB RAM 80GB 7,200RPM ATA disk drive Red Hat 7.2 pre-installed 3 years on-site warranty
£800 + VAT
£1539 + VAT
Prices correct as of 18/7/02. Please check www.dnuk.com and www.dell.co.uk for current prices.
Digital Networks
NOTICE THE DIFFERENCE in price between our server and the competition? You don’t need a degree in economics to notice the cost savings. At nearly half the price of Dell, our Teramac 110 1U rackmount server represents excellent value. Factor in a faster processor, more memory and more storage, and you can save even more. At Digital Networks, we specialise in servers, storage, workstations, desktops and notebooks designed specifically for Linux use. Unlike our competition, we offer Linux pre-installed on all our hardware – completely free of charge. We offer Red Hat, Mandrake and SuSE, plus Microsoft Windows as well. Visit www.dnuk.com and find out why corporate customers, small and medium businesses and most UK universities choose us for their IT requirements.
NEWS
Kernel
Zack’s Kernel News Gentle Hands for IDE coding Last month I explained to you that the IDE code was being rewritten in 2.5 by Marcin Dalecki, amid heated controversy. Recently, Marcin decided to give up the fight, and all of his code changes have been removed from the 2.5 kernel tree. A new set of changes by Andre Hedrick and others, that had been in development in the 2.4 kernel tree, have been forward-ported to 2.5; Alan Cox, although not the primary IDE developer, has agreed to take on the role of maintainer for the moment. Andre, who would otherwise have been the obvious choice as the maintainer, demands such gentle personal handling that Linus Torvalds has found him impossible to work with. As a result of this Andre and everyone else working on IDE, will feed their changes to Alan, who will pass them along to Linus. The IDE layer has been a problem in the kernel for quite awhile, and most developers agree it has been brought to an unmaintainable mess over the years, which may explain why there is so much contention around it. One of the reasons Marcin came under such heavy fire was because of his uncompromising insistence on ripping out all of the broken code pieces, regardless of whether working replacements for the removed code were available or not. The standards documents themselves may also be at fault; IDE discussions on the linux-kernel mailing list often examine the standards line by line in detail, and still lead to no clear agreement of what was intended by the standards body. Andre, who has worked very closely with the standard bodies, claims to understand both the letter and the spirit of their documents; unfortunately he seems so far to be unable to share information without hurling insults at the people he is informing. ■
14
October 2002
Speakers start to talk The PC Speaker driver has been broken in the 2.5 kernel for a long time, and is finally receiving some attention from Stas Sergeev. But since the breakage was the result of correct changes to the Virtual Filesystem subsystem, as opposed to bad changes in the speaker driver itself, the new work has involved more than just bug fixing, and has been a long time coming. So far, reports have come in that MP3s play well using the driver, but there have also been reports of other noise intruding on the proper sound. While Stas feels that this is almost certainly a problem with specific broken motherboards, and not a bug in his code, there are apparently other problems that may keep his driver out of the main kernel tree. For one thing, many modern motherboards come with soundcards already on them, making a speaker driver superfluous. For another thing, the standards governing PC speaker hardware are weak, so that the great variety of possible configurations makes it difficult
INFO The Kernel Mailing List comprises the core of Linux development activities.Traffic volumes are immense and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls that take on this impossible task is Zack Brown. Our regular monthly column keeps you up to date on the latest decisions and discussions, selected and summarized by Zack. Zack has been publishing a weekly digest, the Kernel Traffic Mailing List for several years now, reading just the digest is a time consuming task. Linux Magazine now provides you with the quintessence of Linux Kernel activities straight from the horse’s mouth.
for any speaker driver to get the best performance out of the speaker. And finally, Stas’ driver in its current form uses a ton of CPU time. Stas feels this last is only a minor objection, since there is room to further improve his code. But he also points out that some motherboards are still made without sound cards, in which case the speaker would be the only source of sound on the computer. But he does agree that his code as it stands isn’t ready for inclusion in the main tree. ■
Version 4 this way comes NFSv4 Is coming to the kernel. A number of developers have been working on this for awhile, and patches have begun cropping up for both the 2.4 and 2.5 trees. Now that some of the initial patches have laid the groundwork, Kendrick M. Smith has started to feed Linus and Marcelo patches that implement the actual server code. NFSv4 seeks to answer some of the objections to earlier NFS versions, and to extend it further into new areas that were not taken into account when previous versions were designed. In particular, NFSv4 promises support for IPv6, strong security, good crossplatform interoperability, and in general, support for a range of extensions in other protocols. NFSv4 also promises to maintain the best features of earlier NFS
www.linux-magazine.com
versions, such as easy recovery and independence from particular transport protocols. Mounting a networked filesystem is inherently tricky. It is difficult, for example, for the operating system to be certain not to reuse inodes. A duplicate inode can cause data loss or corruption, and the difficulties involved in reducing the risk of duplicate inodes in NFS has been the cause of much head shaking among kernel developers. Latency issues have also plagued developers over the years, especially the question of how to be certain that rapid or nearly simultaneous changes at one end of the network connection are accurately represented to the user at the other end. Hopefully NFSv4 will address these issues as well. ■
Kernel
Lack of standards POSIX compliance has always been a problem, mainly because Linus and the rest of the kernel developers never hesitate to abandon a standard if they feel it makes no sense. This was illustrated long ago in the clone wars, in which Linus eventually compromised by implementing POSIX thread-creation on top of semantics that he believed made much better sense. Linus recently characterized POSIX compliance in these words: “POSIX is a hobbled standard, and does not matter. We’re not making a ‘POSIX-compliant OS’. People have done that before: see all the RT-OS’s out there, and see even the NT POSIX subsystem. They are uninteresting. Linux is a _real_ OS, not some ‘we filled in the paperwork and it is now standards compliant’.” The
NEWS
Raising the disk limit question is then, what does constitute a standard to which Linux adheres? This is important for systems that wish to be Linux-compatible. If an OS wishing to be Linux-compatible must rely only on the current state of the Linux code, it will be difficult to guarantee that compatibility won’t be broken in the next kernel release. But it seems that any notion of a true “Linux standard” has not yet solidified. Certainly POSIX and other legacy semantics play a large part. But it is not the final word. It may be one of the great strengths of Linux that it is unwilling to bow to tradition; but non-compliance to standards is also an accusation frequently made by free-software proponents against proprietary software companies like Microsoft. ■
Figure 1: Zack’s Kernel-Traffic web site
It should soon be possible to support 128 or 256 SCSI disks on a single system, in both 2.4 and 2.5 kernels. Kurt Garloff posted some patches to do this in 2.4; and while Alexander Viro felt that these patches were not going to slide neatly into 2.5, Kurt felt that certain parts of the patch would not be too much trouble to forward-port. EVMS could probably bring the full functionality to 2.5 without a problem, but Linus Torvalds has made it clear at the recent Kernel Summit, that he did not want EVMS to continue encroaching on the block layer’s domain. A number of key developers seem to be in favour of pursuing Kurt’s work with an eye toward acceptance into the 2.4 and 2.5 trees. Raising the maximum number of SCSI disks is a long-standing problem. Solutions were being proposed as far back as 1992, when Linux was barely a year old. Richard Gooch offered patches in late 2001 to raise the maximum number to over 2000 disks, but his patches were not accepted. This latest attempt by Kurt shows the most promise of actually being accepted, though of course, the task will then be to raise the limit still further. The quest to support bigger, taller systems is ongoing. Large memory, many processors, large files, large filesystem, large disks, large numbers of disks; at every level, developers struggle to support big systems, while still continuing to support smaller desktops and older hardware. ■
Taking the guess out of benchmarks A new tool for benchmarking the Virtual Memory subsystem has emerged: VM Regress, by Mel Gorman. It is still in the early stages of development, but it’s already useful. VM Regress has the ambitious goal of “eventually eliminating guesswork in development.” Although developed for 2.4 kernels, it compiles under the 2.5 kernel as well. The tool is not intended to benchmark real-world scenarios, but instead performs ‘micro-benchmarks’ of particular subsystems, on the assumption that if each individual
subsystem or component performs well, then the whole system will perform well. This is not necessarily a safe assumption, however, as VM development has shown in the past. Often an idealized benchmark has shown one VM version to be ‘better’ than another, while users report subjective impressions that are the exact opposite of the test results. At the same time, restricting benchmarks to only real-world loads will never provide specific, fine-grained numbers about particular areas of the VM.
The quest for the perfect VM benchmark is ongoing. This may in part account for the tremendous divisiveness surrounding VM development; Rik van Riel and Andrea Arcangeli have been proposing competing implementations of VM for years, with both sides having their egos bruised. Linus’ decision to uproot Rik’s VM and replace it with Andrea’s in the midst of the 2.4 “stable” series, was met with tremendous criticism; though eventually most of the critics did come to believe Linus made the right choice. ■
www.linux-magazine.com
October 2002
15
NEWS
Letters
Letters to the editor
Write Access
Your views and opinions are very important to us. We want to hear from you, about Linux related subjects or anything else that you think would interest Linux users! Please send your letters to: Linux Magazine Stefan-George-Ring 24 81929 Munich Germany e-mail: letters@linux-magazine.com Please tell us where you are writing from.
Deutsche Post World Net
Linking ideas Q I have installed a program which adds commands to my system. I have to be in the directory where I loaded the program to make the commands work. Where do I need to make the symbolic links so the commands work no matter where I am in the directory structure? K Cheng, Gosford, NSW, Australia A Symbolic links redirect the kernel to the pathname stored as the link’s contents. These can be created using ln -s /home/jack/jack.h U /usr/include/jack.h
Here a file is created in /usr/include called jack.h which is redirected to /home/jack/jack.h However, this is not the way to go around adding the commands to your system. You need to add the directory path to the commands to the environmental variable PATH. If our new commands are in the directory /home/jack/myprog/ we use export PATH=/home/jack/U myprog:$PATH
16
October 2002
Now when we type in a command it searches through all the directories listed in the PATH variable to find the command. You can see where PATH is checking by typing [root@home jack]# echo $PATH /home/jack/myprog:/sbin:/usr/U sbin:/bin:/usr/bin:/usr/local/U bin:/usr/local/bin
Keeping secrets Q I have managed to install MandrakeLinux on my PC. I have to share the computer. How do I add my sister as another user to the system and keep her away from my private email. R McQueen - email A Depending on the desktop you have running you can add a user by either, in a shell, typing userdrake to access a GUI based wizard or from KDE panel use the Configuration/Other/UserConf option to add and modify users. If you are not running X you can use the useradd command as superuser.
www.linux-magazine.com
[jack@home jack]# su Password: ******** [root@home jack]# useradd jill
This creates the user called jill in a group called jill. It also creates a home directory /home/jill The next step is to create a password for Jill. The UserConf wizard allows you to add a password, but from the commandline you need to use the following [root@home jack]# passwd jill Changing password for user jill New UNIX password: ********* Retype new UNIX password: ******** passwd: all authentication tokens updated successfully [root@home jack]# exit
When jill now logs onto the system, she is in her own home directory /home/jill/. Your email and settings are protected in your own directory which she has no access to. â–
Letters
Spelling it my way Q How do I make an exclude dictionary in StarOffice 5.2? In MS Word I do this by creating a list of words I do not want in the standard dictionary and renaming it the same as my normal dictionary but with .exc as the ending. I use this tool in my job and do not want to do without it. M Chrisostomou, London A StarOffice makes this task a little easier to accomplish. Normally when
Figure 1: Configuring the exception dictionary
NEWS
Together with Kate using the spellchecker we can add words that are not in the standard built-in dictionary by placing them into a custom dictionary. Similarly if we want to remove a word we need to create a custom exception dictionary. Suppose that we want to never use the word “hopefully”. We start this by using Tools/Options, then from the general menu choose Language. On the right you now get to highlight the exception dictionary. Choose the edit icon and you can now add any word you do not like. In our example we also add a replacement word “cheesecake”. Now whenever we invoke the spellchecker it will, upon finding “hopefully”, ask if we should replace it with our new word “cheesecake”. ■
Q I tried using the KDE editor Kate. It ran fine and I use it as my default editor. When choosing a file to open, the “File List” now appears in a seperate window. How do I recombine it back so it just appears in Kate but as a split window similar to Konqueror. WD Parkin, Brugge, Belgium A You can either change the settings within the Settings/Configure Kate or you can remove your katerc configuration file which will be found in your personal home directory ~/.kde/share/ config/U katerc This will then reset Kate. ■
Figure 2: The Undocked File selector
Friendly e-mails – hidden problems Q I have started to use Linux after many years using Outlook. I have sucessfully configured it to download my mail. The only problem is html mail sent from a friend is displayed as code. Can Kmail support HTML and if so what do I need to add? S George
Figure 3: Kmail in plain text mode
A Kmail will support html coding and display it as you want. Choose the Settings/Configure Kmail option. Select the Security option and the General tab. Here you can turn on “Prefer HTML to plain text”. The drawback to this is the possiblity of security flaws by running HTML automatically. Emails from a
friend are fine but spam is normally sent in HTML. Downloading a small spam message in plain text you can usually see it is junk mail. With HTML turned on it would start to connect to whatever advertising site is in the spam. It is considered bad netiquette to send HTML emails outside the world of Microsoft. ■
Figure 4: Kmail with HTML turned on
www.linux-magazine.com
October 2002
17
COVER STORY
Apache Intro
Apache on Linux: The Modules, the Server and the Project
Winning Team T
he Linux and Apache team are capable of winning against allcomers. The size of the challenger does not matter. Team work and modular building give a responsive champion. Most administrators will confirm that Linux is a stable and successful network professional. And the Netcraft survey clearly shows Apache’s pervasiveness (see Figure 1). Linux guards Apache’s back providing it with the peace of mind it needs to devote itself entirely to serving up web pages. Additional modules enhance Apache’s functionality. One reason for the staggering longterm consumer satisfaction with Apache is the web server’s modular design which allows Apache to adapt to changing roles. Three of the five articles forming our cover feature are thus concerned with modules. Finding and selecting the right Apache modules is a science in itself. There are ready-made solutions for many issues, although most webmasters may never have heard of them. We will be introducing a selection of useful tools on page 19 onwards, and taking an in-depth look at the modules Mod_rewrite, the “Swiss army knife” of URL manipulation
Cover Story Apache Modules ...................19 A small selection of useful Apache modules, chosen from the hundreds that are available for the web server.
Mod_rewrite ...........................22 Apache can use the mod_rewrite module to automatically re-write requested URLs.
This month’s issue of Linux Magazine features five articles on Apache. Running Apache under Linux makes for a knock-out combination. These discuss some modules, upgrading to Apache 2.0, and the Apache Software Foundation. BY ACHIM LEITNER, ULRICH WOLF
(page 22) and Mod_gzip – used for serving up compressed data from web pages (page 26).
The Old and the New Version 2.0 sees Apache reaching new levels of modularity offering modules even for core functionality such as multithreading. The article on page 29
provides you with further details on this and other new features. Our special is rounded off by a report on the Apache Software Foundation (ASF), which provides a home for innumerable successful projects. Starting on page 32 Linux Magazine shows you why the ASF is so attractive for developers. ■
Mod_gzip ..................................26 The Mod_gzip module provides web admins with jet propulsion .
Apache 2.0
...............................29 Administrators need to think about the jump from Apache Server 1.3 to version 2.
Apache Projects .....................32 Apache is more than a web server. Why is the Foundation so attractive to developers?
Figure 1: Since 1996 the majority of web servers all over the world have been Apaches with Microsoft’s IIS well down in second place. (Source: Netcraft, www.netcraft.com/survey)
18
October 2002
www.linux-magazine.com
Apache Modules
COVER STORY
A Small Selection of Useful Apache Modules
Docking Station A
web server is a simple piece of software when you think about it. Reading any HTML file from the file system and serving it up to the browser is a task that a comptetent programmer could probably handle with a few hundred lines of code. But of course a good web server like Apache can do a whole lot more. And modules are the source of most of the web server’s intelligence. The Apache web server has gone through a similar evolution to the Linux Kernel in this respect. It used to be monolithic, but now more and more functionality is provided by modules. This is particularly evident in the new Apache 2.0 version, although there are still probably more modules available for Apache 1.3 than the authors have had hot dinners [1]. Having said that, a handful of practical modules can make a web admin’s daily grind a whole lot easier.
There are literally hundreds of modules for the Apache web server. And that makes it difficult to pick out the goodies, or even find the right software for the job in hand. This is the starting point for a compact overview of the authors’ favorite modules. BY STEFAN WINTERMEYER, RONALD KARSTENS
MPM: Multiprocessing Modules in Apache 2.0 Apache 2.0 sees the introduction of a new group of modules: the server delegates basic network and operating system functionality to so-called multiprocessing modules (MPMs). The MPMs in turn allow Apache to process multiple requests simultaneously. The admin has several options to choose from – but changing options is simple if they are modular. MPMs are discussed in detail in the article on page 29.
Mod_speling: “Spel”checking for URLs Good web servers provide a structure that allows users to access some content directly via URLs. Sometimes users will more or less know the URL, although they are not sure about the case. Mod_speling can help you solve this issue. As you might have guessed from the way the module misspells “spelling”, this module corrects typos. If there is a small typo in the URL a web user provides, instead of issuing an
error message the user will be pointed to the right page. The module will correct up to one mistyped character and any number of lower/upper case errors. This will save most web users headaches or yet another visit to Google. Take a look at [2] for more information on mod_speling.
Mod_include: SSI – Server Side Includes When the first frame based web pages were introduced, some webmasters ironically suggested that their colleagues possibly needed so many frames because they had never heard of server side includes (SSIs). And there is certainly a modicum of truth in this statement: If you want to use the same header, a footer and a menu from the current directory without frames, there is nothing to prevent you using native tools and ditching the frames, and the SSI “include” element is the way to do this. SSI instructions are directly embedded in special tags in HTML documents.
“<!--#include virtual="File" -->” will cause SSI to read the content of a file at the tag position before the server serves up the page. Most standard distributions use an Apache configuration that passes “*.shtml” files to the SSI module (“mod_include”). If this does not work straight away, a quick look at your “httpd.conf” may provide some clues. You will need the following entry: AddType text/html .shtml AddHandler server-parsed .shtml
The “Options +Includes” in the “<Directory>” section is used to enable server side includes. SSI can also be used recursively, as you can see from Listing 1 and 2. In line 11 Listing 1 integrates the “/includes/footer.shtml” file. This file is also parsed by the SSI module and will output the current date in about line 4. If you need to, you can use SSIs to run commands on the web server and
www.linux-magazine.com
October 2002
19
COVER STORY
Apache Modules
Listing 1: Example of SSI <html> <body> <!--#include virtual="/includes/header.html" --> <table><tr><td> <!--#include virtual="menue.html" --> </td><td> hello world! </td></tr></table> <!--#include virtual="/includes/footer.shtml" --> </body> </html>
include the result. The following instruction adds the output from “ls” to a web page: <!--#exec cmd="ls" -->
External commands can take so long to execute that the page build up is either extremely slow, or does not happen at all (timout). But otherwise there is no real reason to expect SSIs to cause performance bottlenecks on large web sites. It does not always make sense to enable SSI for the “.html” and “.htm” suffixes, but if you do, your web server should handle the situation gracefully. If users other than the webmaster can create HTML files, “#exec” can open up a large security hole, allowing normal users to execute commands with the web server’s privileges. You will need to apply the “Option +IncludesNOEXEC” configuration, to close the gap in this scenario. SSIs will still be enabled, but dangerous commands will not. For more information and examples on SSIs see [3] and [4].
Mod_php: Server Side Scripting If SSIs do not provide sufficient functionality for you, you might like to try PHP as a universal tool for dynamic web pages. The server will first parse the page at runtime and then serve up the normal HTML files. Mod_php will handle any files with the “*.php” suffix. Listing 3 contains a Hello World program in PHP. As you can see in line 3, this module also uses special tags to include commands in HTML files: “<?php command ?>”. As PHP is a
20
October 2002
fully-fledged scripting language with various database interfaces, there are no limits to what you can do. Installing PHP is quite complex, as you need to enable a variety of add-ins and options prior to this step. This applies equally to dynamic libraries and PHP modules, and to the question as to the database interfaces you will need. We recommend compiling PHP as a DSO module (Dynamic Shared Object), as you can then replace PHP without re-installing your Apache. Although the documentation still has not caught up to Apache 2.0, the current PHP versions seem to work well with the new server. The following example of a “configure” call uses MySQL as its database backend: $ CC=gcc CFLAGS=-O2 LDFLAGS=-s U ./configure U --with-mysql=/opt/mysql U --with-apxs2=/opt/apache/bin/U apxs --prefix=/opt/apache U --sysconfdir=/etc/php
Mod_perl: Embedding Scripting Languages in HTML We should not forget Perl while we are discussing PHP. Dynamic websites in
Perl do not require any Apache modules, as you can write CGI programs in Perl. This variant does have its disadvantages, as it means launching a Perl instance every time the page is opened. A new process is created, and the whole Perl interpreter is loaded into virtual memory where it can translate and run the CGI program. Of course this works, but it is slow and does not scale well. One possible solution is to use an Apache module. When you launch your Apache, mod_perl loads a single instance of Perl which stays active and handles any requests for Perl programs. This assumes that you included mod_perl when compiling the web server, or loaded the DSO module – and this is no trivial task. Some CGI programs also require you to perform modifications, and you will not be able to harness the full power of mod_perl without adapting your scripts. See [6] for a howto and some useful tips. Warning: Apache 2.0 requires a mod_perl version from the 2.0 series, which is currently beta – this is one reason why some production servers are still hesitating before moving up to Apache 2.0.
Mod_bandwidth: Less Bandwidth is More Most webmasters are familiar with this dilemma: You have been asked to place a video or an MP3 file by an up and coming artist on your web server – but without using up all your bandwidth. You can use a bandwidth limiter to do so. Most Linux distributions do not provide a module for this task and to make things worse there are several different approaches. You can refer to [7], using the “bandwidth” search key for a list of available modules. If you are experiencing general bandwidth problems, you might like to
Listing 2: “footer.shtml” <!-- begin footer --> <HR> <!--#config timefmt="%A %B %d, %Y" --> Today is <!--#echo var="DATE_LOCAL" --> <BR> <!--#config timefmt="%D" --> This file last modified <!--#echo var="LAST_MODIFIED" --> <!-- end footer -->
www.linux-magazine.com
Apache Modules
Listing 3: “hello-world.php” <html> <body> <?php echo "Hello World<p>"; ?> </body> </html>
refer to the section on Traffic Shaper [9] in the networking howto. But be careful if you impose bandwidth restrictions, just a slight error can severely impact your web server’s performance.
Mod_mp3: Simple MP3 Streaming With bandwidth restrictions and the MP3 keyword, the obvious place to go would be an MP3 server of your own. And again Apache can offer you a module for this task, allowing you to provide a constant stream of music to your MP3 clients, such as XMMS [10], instead of simply allowing shared access to audio files. If you want to run your own radio station on your intranet (with copyrightfree music, of course), you should definitely take a look at mod_mp3 streaming [11]. Again, most distributions do not include this module and you will need to compile and include in your Apache implementation. The configuration steps are simple. In Listing 4 the module is used to serve up the MP3 files in “/home/export/mp3”, and his two favorite songs (lines 7
through 9) at random (line 10) as “Stefan’s Radio” (line 4).
Mod_auth_ldap: Using LDAP for User Authentification Most web sites have restricted areas intended only for specific users. External web servers can use the “mod_auth” mechanism [12], to handle this requirement. The user credentials are stored in a password file on the web server. But the demand for single sign-on for Intranet web servers continues to increase, failing that you may be asked to at least ensure that the same password can be used for the whole range of services on offer. Centralized password management on a Novell, Microsoft, or Open LDAP server [13] is quite common, but there is no reason for Apache to hide its light under a bushel. Mod_auth_ldap [14] uses LDAP for user authentification. Listing 5 shows a configuration example.
COVER STORY
that you have the OpenSSL libraries (which should be installed by default on any current Linux distribution), and then type “./configure --enable-ssl” to compile Apache with SSL in place. A production HTTPS server requires a cryptographic certificate. Depending on your target group, you may decide to approach a commercial Certificate Authority or set up your own CA. The certificate and key files will then be stored below the Apache configuration directory and read on launching the server after, possibly, prompting you for a password. ■
Mod_ssl: Serving Up Web Pages with SSL and TLS Encrypted communication is becoming increasingly important and this development is reflected by Apache. Only a few months ago integrating encrypting modules used to be a fairly complicated and time-consuming task. However, Apache 2.0 comes preconfigured with mod_ssl [15]. Ensure
Listing 5: LDAP Authentification <Directory "/usr/local/http/U htdocs/a-team-doku"> Options Indexes FollowSymLinks AllowOverride None order allow,deny allow from all AuthName "A-Team only" AuthType Basic LDAP_Server ldap.example.com LDAP_Port 389 Base_DN "o=A-Team HQ,c=DE" #Bind_Pass "secret" UID_Attr uid require valid-user </Directory>
Info [1] Overview of Apache Modules: http://modules.apache.org/ [2] Typo Correction: http://httpd.apache.org/docs/mod/mod_speling.html
Listing 4: MP3-Streaming-Server <VirtualHost mp3.example.com:U 8000> ServerName mp3.example.com MP3Engine On MP3CastName "Stefan's Radio" MP3Genre "European Trance" MP3 /home/export/mp3 MP3 /tmp/favesong1.mp3 MP3 /tmp/favesong2.mp3 MP3Random On #Increase this if your U connections are timing out Timeout 1200 ErrorLog /var/log/httpd/U music-stream_error_log </VirtualHost>
[3] Server Side Includes: http://httpd.apache.org/docs/mod/mod_include.html [4] SSI Howto: http://httpd.apache.org/docs/howto/ssi.html [5] PHP Tutorial: http://www.php.net/manual/en/tutorial.php [6] Mod_perl Download and Documentation: http://perl.apache.org/ [7] Search for Apache Modules: http://modules.apache.org/search [8] Docs for mod_bandwidth: http://www.cohprog.com/v3/bandwidth/doc-en.html [9] Info for Traffic Shaper: http://www.tldp.org/HOWTO/Net-HOWTO/x1416.html [10]XMMS: http://www.xmms.org [11] Mod_mp3: http://media.tangent.org/ [12] User Authentification: http://httpd.apache.org/docs/howto/auth.html [13] OpenLDAP: http://www.openldap.org/ [14] Mod_auth_ldap Homepage: http://www.muquit.com/muquit/software/mod_auth_ldap/mod_auth_ldap.html [15] Introduction to SSL and TLS on Apache: http://httpd.apache.org/docs-2.0/ssl/ [16]Apache 2.0 Module Documentation: http://httpd.apache.org/docs-2.0/mod/ [17] Apache and Various Modules: http://www.apachetoolbox.com/
www.linux-magazine.com
October 2002
21
COVER STORY
Mod_rewrite
Re-Writing URLs with the Apache mod_rewrite Module
Black Magic Static URLs pointing to static pages? No way! Apache can use the mod_rewrite module to automatically re-write requested URLs. BY MARC ANDRÉ SELIG
NASA
I
f a browser or any other HTTP client sends a request to a web server, the most important element of the request will always be a URL. The URL uniquely identifies the required resource. But a URL does not need to point to a file on the server’s hard disk. This is obviously the case where CGI scripts or other dynamic pages are involved. Apache will need to search for any file not directly referenced. To do so, the server needs to be told the path to the “DocumentRoot” in the configuration file “httpd.conf”. On the other hand, directives such as “Alias”, “ScriptAlias”, and “Redirect” can be used to redirect the incoming requests. The mod_rewrite module that is pre-installed on any
22
October 2002
Apache 1.3 or 2.0 machine, is far superior to the methods mentioned so far. The module uses regular expressions to check the requested URL and, if needed, one or two additional variables. If a number of conditions apply, the module replaces the URL with a new one, although you can still access some elements of the original address. In other words mod_rewrite can either replace entire URLs or simply re-write parts of them. This is why the author of this module, Ralf S. Engelschall, calls it the “Swiss Army Knife of URL Manipulation”. But Brian Moore hits the nail on the head: “Mod_rewrite is voodoo. Damned cool voodoo, but still voodoo.” Mod_rewrite
www.linux-magazine.com
is one of the most complex elements in Apache – the control logic is not particularly intuitive, and debugging is a nightmare. So, there are good reasons for not making it part of Apache’s standard equipment – the maintainers simply do not want to confuse newcomers.
Not Automatically Included This does not mean that mod_rewrite is not included with the web server, but the “./configure” will not include the module in “httpd” unless you explicitly tell it to do so. Some Linux distributions take this step for you. You might therefore like to type “httpd -l” to discover the modules already designated for Apache.
Mod_rewrite
If the output does not contain a reference to “mod_rewrite.c”, you may still find that the module was compiled as a dynamic shared object (DSO). In this case you will need to look for a file called “mod_rewrite.so” in your Apache module directory; you can type “locate mod_rewrite.so” to do so. If you draw a blank again, you will need to re-compile Apache supplying the “--enable-rewrite” “./configure” option. The possibility of compiling the module as a loadable DSO is probably more interesting for Linux distributors than for the webmaster of a production site – the advantage being the fact that a DSO is completely removed from the webserver, if not used. The disadvantage being the performance impact.
Configure that Server! You can use the following syntax to enable the DSO variant of the module: LoadModule rewrite_module U modules/mod_rewrite.so
in “httpd.conf”. Other options can be placed in the server configuration file “httpd.conf”, or in the directory specific “.htaccess” configuration files. (Of course, this only applies if you have not already disabled parsing of “.htaccess”.) The first important directive is “RewriteEngine on”, which is used to enable the module. If this is missing, mod_rewrite will do nothing at all. For troubleshooting activities, you are recommended to log all the module’s activity right from the outset. Use “RewriteLog” to supply the logfile required for this purpose. If everything works, you will probably want to disable logging later, by commenting out the directives in Listing 1. Logging consumes valuable resources
Listing 1: “httpd.conf” entries for mod_rewrite
and impacts webserver performance. Your Apache will call “mod_rewrite” when handling requests. The module uses a set of rules (defined by “RewriteRule” in “httpd.conf” or “.htaccess”), each of which contains a search pattern, a replacement string, and possibly one or more flags. Additional flags to the ones shown in Table 1 are available for complex tasks. See [1] for a comprehensive overview.
How Does it Work? The web server compares the URL requested by the client with the search pattern in the individual rules. For example: RewriteRule ^/netscape\.html$ U /mozilla.html
The first argument contains the search pattern and the second the replacement string; a third argument could contain flags. When the “/netscape.html” URL is requested, mod_rewrite instead returns “/mozilla.html”. Since the search string is a regular expression, the period it contains needs to be protected by a backslash to prevent it being interpreted as a wildcard. RewriteRule’s can be prepended in the configuration file by one or multiple “RewriteCond” directives comprising comparitive variables, a search string and possibly a flag. They are used to provide additional conditions. The replacement string defined in the “RewriteRule” only applies if all the rewrite conditions are fulfilled: RewriteCond %{TIME_HOUR}%U {TIME_MIN} <0600 RewriteRule ^/special/lunch\.U html$ /it_U is_too_early.html
This example concerns a request for a lunchtime special. The variables
LoadModule rewrite_module U modules/mod_rewrite.so
COVER STORY
“TIME_HOUR” and “TIME_MIN” are concatenated. A request entered at quarter past eight in the evening would thus produce the string “2015”. The expression “<0600” is then applied to the string. If the comparison is successful, the subsequent “RewriteRule” is applied. Instead of the special, the content of the file “/it_is_too_early.html” is returned to the user. As this example shows, some special conditions apply to regular expressions in mod_rewrite – refer to Table 2 for an overview. In addition to the variables defined in the CGI specificat mod_rewrite also offers a few extra goodies that allow you to query server information and the time. Table 3 shows the most commonly used variables. Note that the order in which mod_rewrite parses the search keys contradicts the conventions adhered to by most programming languages. So “RewriteCond” always prepends the corresponding “RewriteU Rule”, but the server will still parse the search string in “RewriteRule” first, before going on to check the conditions. In this case adhering to the order stipulated in the configuration file.
Tidy Appearance One typical application of mod_rewrite is harmonizing URLs. You often find that resources can be accessed via a variety of addresses, but only one of them is the official address. Uniform URLs are particularly important for readability on search machines. If a page can be accessed via a variety of names, each one of these will achieve fewer hits. Your ranking drops and your page is not so prominent on Google. However, you will certainly not want to remove alternative URLs to avoid impacting availability to users. But there is a solution to this issue: You can allow any possible alternatives, but redirect any
Listing 2a: Harmonizing File Requests RewriteEngine on
RewriteEngine on RewriteLog "/export/apache/U logs/rewrite.log" RewriteLogLevel 2
RewriteRule RewriteRule
^/download\.html ^/downloads\.html
/download/ /download/
[NC,R=301] [NC,R=301]
RewriteRule
^/downloads/(.*)
/download/$1
[NC,R=301]
www.linux-magazine.com
October 2002
23
COVER STORY
Mod_rewrite
users that type the unofficial URL to the official address. Listing 2a shows an example: The first two rules redirect requests for “download.html” and “downloads.html” to the “/download/” directory. The “[NC]” flag means “no case”, and refers to upper/lower case letters. The condition will thus equally apply to “DownLoad.html”. “[R=301]” ensures that the user or search engine will notice they are being redirected, instead of performing this transparently on the web server. Browsers will immediately run the redirection command, whereas a robot or spider will immediately update its records. The third and final “RewriteRule” points the content of “/downloads/” (with an s at the end) to “/download/” (without an s). The trick is, if there are links pointing to files in “/downloads”, the file name (possibly including subdirectories) can be retained. The user will immediately be pointed to the requested file and will not need to climb back up the tree from “/download/”. Listing 2b is more complex: It prepends “www.” to any requests. The web server is configured to react to “company.com” but, users should still type “www.company.com”. The first “RewriteCond” filters any requests already containing “www.”, as they do not need to be re-written. The second blocks any requests without a “Host:” header - some older clients may still produce them, and redirection would be fatal in this case. The “RewriteRule” finally stores the path for the current request and uses it to create a new, official URL.
Virtual Hosts With a Difference Now let’s look at a more complex practical task. Suppose a company that runs the “company.com” domain, creates the “imode.company.com” subdomain for mobile phone fans. The web content for this subdomain is stored in the subdirectory “i” below the “DocumentRoot”. The web server is hosted externally – which prevents the webmaster modifying “httpd.conf” to set up an independent virtual server. The only way to solve this issue is to use an “.htaccess” file, as shown in Listing 3.
24
October 2002
Listing 2b: Harmonizing Host Names RewriteCond %{HTTP_HOST} !^www\. [NC] RewriteCond %{HTTP_HOST} !^$ RewriteRule ^(.*) http://www.%{HTTP_HOST}/$1
[R=301]
Listing 3: Virtual Hosts with mod_rewrite RewriteEngine on RewriteCond %{HTTP_HOST} RewriteCond %{REQUEST_URI} RewriteRule ^(.*)$
^imode\.firma\.com$ !^/i/ /i/$1
[NC]
Listing 4: Static URLs instead of Query Strings with mod_rewrite RewriteEngine on RewriteRule ^/(animals|plants)/(.+)/(.+)\.html$ U /template.php?menu=$1&submenu=$2&subsubmenu=$3 [PT]
The first “RewriteCond” compares the “Host:” header contents with the regular expression “^imode\.company\.com$”. Thus, the rewrite is restricted to requests directed to “imode.company.com”. The “RewriteRule” itself simply takes the entire request and saves the expression that matches the string in parentheses in a group. Now mod_rewrite can replace the URL by “/i/” and the same group. If a user wants to access the “mail.html” page, for example, this would produce the “/i/mail.html” address. The second “RewriteCond” prevents the system from entering an indefinite loop. None of the conditions previously checked is changed when the file path is replaced. Mod_rewrite would thus
indefinitely prepend an “/i/” to the request – until the user gave up. So the second condition filters out any requests that already contain an “/i/” thus preventing the loop condition.
Using Dynamic Scripts to Produce Static Pages Many search engines seem to be allergic to query strings. If you go to the trouble of implementing a database based template system in PHP, you may find yourself dropping down the ranks on the search engine. This is due to the fact that the spider will notice the give-away question mark in the URL and as a precautionary measure, not create an index for any subsequent pages.
Table 1: Important flags for “RewriteCond” and “RewriteRule” [NC]
Ignore upper/lower case for comparisons.
[OR]
Links one “RewriteCond”to another using a logical OR. Normally,any existing conditions will need to be fulfilled for a replacement to take place; this flag permits several variants.
[R=301]
Performs external redirection. mod_rewrite is normally transpartent for the user.The status line in the browser will show the URL the user originally typed, although this URL has in fact been re-writ ten. However,if you want to draw the user’s attention to the redirection, you might prefer to use external redirection.The browser will then receive an error code and the URL of the new page, and will subsequently actively request this page.The error code to be returned to the browser is defined by the number that follows “R=”.“301”(permanently moved) and “302”(temporary redirection) are typical codes.
[L]
Terminate mod_rewrite processing without applying any more rules.This flag prevents a correc tion that has been performed from being overwritten by a later rule. It also saves the administrator some confusion.
[N]
This runs the newly defined URL through any applicable mod_rewrite rules.
[C]
Only process the next“RewriteRule”if the current rule applied.
www.linux-magazine.com
Mod_rewrite
Table 2: Regular Expressions in mod_rewrite .
Any character.
\.
A period.
.+
One or more characters.
\.\+
A period and a plus sign.
.*
No character or multiple characters.
.?
No character or any single character.
^x
“x”at the start of a URL or a file name.
x$
“x”at the end of a URL or a file name.
x|y
Either “x”or “y”.
(.*)
Group:The text matched by “.*”is stored in the “$1”variable in the case of a “RewriteRule”,or in the variable “%1”if the regexp is used in a “RewriteCond”. If you use multiple groups in a single expres sion, the variables “$2”,“$3”,… or “%2”,“%3”,… are used.These variables can be used in the replace ment string.
(x|y)
A different example of a group. Searches for “x”or “y”and stores the matching text.
[-0-9a-z]*
Any number of lower case letters,figures or dashes.
[^/]*
Any number of characters but not the slash character.
!regexp
This expression is true if the regexp is not found.
Additionally, there are a number of special tests.The following operators do not search for regular expressions but compare a string with another,or check for a file name or URL . <4500
The comparitive expression is less than 4500. Note:This is not a numeric but an ASCII comparison.
>4500
The comparitive expression is greater than 4500.
=““
The comparitive expression is an empty string.
-d
The comparitive expression is points to a directory.
-f
The comparitive expression is points to a normal file.
-s
The comparitive expression is points to a normal file that is not empty.
-l
The comparitive expression is points to a symbolic link.
-F
The comparitive expression is points to a normal file that can be read by the current client.
-U
The comparitive expression is points to a valid URL that can be read by the current client.
What you have seen so far is just a taster of what mod_rewrite can do: This modules applications are as unbounded as its complexity. Refer to [2] for a collection of useful and useless examples of practical applications. If you intend to experiment with mod_rewrite, use a lab system first. Mod_rewrite is quick enough for production systems, but unexpected configuration errors can take a system down. ■
Cookies set for the client.
%{HTTP_HOST}
Domain name of the virtual host queried.
%{HTTP_REFERER}
Page with a link to this page (can be omitted).
%{HTTP_USER_AGENT}
Client,such as “Mozilla/4.0”.
%{QUERY_STRING}
Query string transferred by a GET form.
%{REMOTE_ADDR}
Client IP address.
%{REMOTE_HOST}
Domain name of the client, if known.
%{REMOTE_USER}
User name of the client, possibly after successfully completing authentication.
%{REQUEST_URI}
The URI requested by the client.
%{REQUEST_FILENAME}
The corresponding file on the local file system.
%{SERVER_ADDR}
The web server IP address.
%{TIME_DAY}
Current date,day.
%{TIME_MON}
Current date,month.
%{TIME_YEAR}
Current date,year.
%{TIME_HOUR}
Current time,hour.
INFO [1] Original Documentation: http://httpd.apache.org/docs-2.0/mod/U mod_rewrite.html [2] A Treasure Trove of Tips,Tricks and Examples: http://httpd.apache.org/U docs-2.0/misc/rewriteguide.html
THE AUTHOR
Media types accepted by the client, for example “text/plain”or “audio/*”.
%{HTTP_COOKIE}
Another case for mod_rewrite: Where the template system would use a URL something like “http://somewhere.com/ template.php?menu=animals&submenu =fish&subsubmenu=shark”, the world outside will be shown a nice static URL, such as “http://somewhere.com/ animals/fish/shark.html”. Listing 4 shows the corresponding entry in “httpd.conf”. The search key comprises three groups, the first of which must contain either the string “animals” or the string “plants”. This avoids impacting on any other files or directories. Mod_rewrite uses all three groups to construct an internal URL, which is then called by the PHP script. The “[PT]” flag ensures that any “Alias”, “Redirect”, and “ScriptAlias” directives are applied to the result. Thus, the example will not only work for PHP, but for genuine CGI scripts written in Perl or a similar language.
Future
Table 3: Interesting Variables for “RewriteCond” Directives %{HTTP_ACCEPT}
COVER STORY
%{TIME_MIN}
Current time, minute.
%{TIME_SEC}
Current time, second.
%{ENV:PATH}
“$PATH”environment variable for Apache.
%{HTTP:Connection}
“Connection:”header in HTTP request.This allows you to query multiple headers.
Marc André Selig spends half of his time working as a scientific assistant at the University of Trier and as an ongoing medical doctor in the Schramberg hospital. If he happens to find time for it, his currenty preoccupation is programing web based databases on various Unix platforms.
www.linux-magazine.com
October 2002
25
COVER STORY
Mod_gzip
Dynamic Webpage Compression with Mod_gzip and Apache
Teaching Penguins to Fly HTML files compress quite well. Content encoding and compressed transfers allow vast improvements in the effective transfer rate of a web server. The Mod_gzip module for Apache provides Linux based web admins with jet propulsion for their favorite penguin. BY ULRICH KEIL
D
espite a modern modem or ISDN connection you sometimes feel that a mere dribble of data is getting through to you from the Web. The reasons for slow page build up are many and varied • inadequate server hardware resources, • slow transfer of data across the Internet or • a slow Internet connection on the part of the end user. A sysadmin can normally update the server hardware with a minimal investment, and there is certainly very little one can do to remove the Internet bottleneck. Thus the last-line issue (the end user, or client, internet connection) provides the greatest potential for enhancing performance. The majority of Internet users still do not have access to a broadband connection and are forced to resort to modems or ISDN. This means that any data will have to cross the telephone line bottleneck, no matter how quickly they are generated and served up. Figure 1 shows the download time for a 300 kbyte HTML file using a 28k modem.
Content Encoding Content encoding was introduced in 1999 as part of the HTTP 1.1 standard and allows you to compress the content to be transferred using the GZIP/ZLIB compression algorithms. Common web browsers (MS Internet Explorer version 4.0 or later, Netscape Navigator version 4.0 or later, Opera, Lynx, W3m) transfer the “Accept-Encoding: gzip, …” string during the HTTP handshake to inform the server of the compression techniques they support, if any. The server will then compress any data it needs to transfer, and the client will in turn decompress the data and continue to process it.
26
October 2002
These steps are done transparently, that is the user will not notice anything – apart from an obvious performance gain. If a client that does not support HTTP 1.1, such as a search engine, a proxy, or an old browser, sends a request, the data will simply be transferred as it is. Since a compressed HTML file is only a fraction the size (20 to 30 percent) of the original, transfer rates can be improved considerably by compression techniques. Figure 2 shows the throughput of a 56 K modem, which can achieve
www.linux-magazine.com
values of more than 30 kbyte/s for compressed data.
Friends in need: Mod_gzip Although it might seem obvious that compression can cause an additional load on the server, field tests have shown that the opposite is in fact true. Due to the fact that client requests do not take up so much of the Httpd child processes’ time, system resources can be released more quickly. This in turn allows more requests to be processed. If the web
Mod_gzip
is now ready and willing to deal with user requests. Just a note at this point: Although it is theoretically possible to compress any data transferred via HTTP, under normal conditions it only makes sense to compress ASCII files where a considerable reduction in size can be achieved (for example.html, .pl, .php, .txt). Double compression of files such as images will normally cause an overhead that delays the data transfer.
100 90
Modem without Compression
85
80 70
Modem with 2:1 Compression
50
40
60
40
Modem with 4:1 Compression
21
30
8
20 10
Modem with HTTP Compression
0 Seconds
More Speed?
Figure 1: Download time for a 300 kbyte HTML file using a 28k modem
server is hosted by a provider that charges you for traffic volume, you can use content encoding to achieve considerable traffic reductions, and thus save a substantial amount of money. The Mod_gzip module available under the Apache license provides a complete implementation of the content encoding standard (RFC 2613) for Apache 1.3. The module, which was developed by Hyperspace Communications, which is currently available as version 1.3.19.1a and is part of the commercial Hyperspace package. The module is capable of compressing both static and dynamically generated content on the fly, and is used by freshmeat.net, slashdot.org and webhostlist.de just to name a few. Although it is possible to link Mod_gzip into Apache, installing Mod_gzip as a DSO
COVER STORY
module is probably a better alternative since it saves you the time recompiling Apache, and the entire installation process can be completed in less than a quarter of an hour.
For files that exceed the size defined in “mod_gzip_maximum_inmem_size” (that is, 64 kbytes by default), Mod_gzip will create a file in the temp directory, although this causes hard disk activity
Listing 1: “ramdisk.sh”
Installation: Plug & Play
#/bin/bash
After downloading the sources from [1], make sure you are superuser, root, and type the following:
# Initialize the ramdisk dd if=/dev/zero of=/dev/ram U bs=1k count=4096 mke2fs -vm0 /dev/ram 4096 <\c> # Create a mountpoint mkdir -p /var/cache/ramdisk chown nobody.nobody /var/cacheU /ramdisk chmod 770 /var/cache/ramdisk
apxs -i -a -c mod_gzip.c
to compile and install the module. The entries required to load Mod_gzip must be placed in “httpd.conf”. But before you can finally use the module, you must insert the configuration directives from Listing 3 into “httpd.conf”. All that remains now, is to restart your Apache by typing “apachectl restart”. Mod_gzip
#Mount the ramdisk mount -t ext2 /dev/ram /var/U cache/ramdisk
Listing 2: “compress.sh”
Figure 2: Compression improves the client data throughput by several orders of magnitude. This figure shows a 56k modem with a throughput of
#!/bin/bash ARGS=2 E_BADARGS=65 if [ $# -ne $ARGS ] then echo "Syntax: `basename $0` path suffix" echo "Example: `basename $0` /home/httpd html" exit $E_BADARGS fi for directory in `find $1 -type d` do for filename in $directory/*.$2 do if [ -f $filename ] then gzip -c9 -v $filename >$filename.gz fi done done
more than 30 kbyte/s
www.linux-magazine.com
October 2002
27
Mod_gzip
INFO [1] Mod_gzip Homepage: http://www.remotecommunications.com/apache/mod_gzip [2] Michael Schröpls Mod_gzip Page: http://www.schroepl.net/projekte/mod_gzip [3] Content Encoding via Perl Scripts: http://www.schroepl.net/projekte/gzip_cnc [4] Performance Test on the Hyperspace Homepage: http://www.ehyperspace.com/ html/solutions/performance.html
and therefore wastes time. To increase your Apache’s performance, you might like to swap your temp directory out to a RAM disk. To do so, your kernel will need RAM disk support. You can use the following rule of thumb to calculate the size of the disk: “Maximum number of simultaneous requests multiplied by 100 kbytes” – this should allow you enough leeway to process requests even at peak times. However, since the kernel will only support RAM disks of 4 mbytes by
default, you may need to create a new kernel, specifying the required size for “Block Devices | Default RAM disk size” in the configuration file. After ensuring that a properly dimensioned RAM disk is available, you can use Listing 1 to initialize and mount the disk. Now simply add mod_gzip_temp_dir U /var/cache/ramdisk
mod_gzip_can_negotiate Yes
to “httpd.conf” – and after a restart
Listing 3: “httpd.conf” # enable mod_gzip mod_gzip_on Yes # Temp directory # (must be writable for Apache) mod_gzip_temp_dir /tmp # Keep temporary files? # Set this to Yes for # debugging only mod_gzip_keep_workfiles No # Exclude browsers whose # content encoding # implementation is faulty mod_gzip_item_exclude reqheaderU "User-agent: Mozilla/4.0[678]" # Exclude CSS and Javascript, # since Netscape 4 cannot # decompress these files # properly mod_gzip_item_exclude file \.js$ mod_gzip_item_exclude file U \.css$ # Limit file size (in Bytes) # Compression causes an overhead # for files <0.5 kb mod_gzip_minimum_file_size 500 # File >1 MB compression causes # a delay in serving mod_gzip_maximum_file_size U 1000000
28
October 2002
your Apache will save resources by writing temporary files to memory. Since a RAM disk is volatile and disappears into the happy hunting grounds when you restart your computer, you might also like to add Listing 2 to an init script. This script needs to be run before you launch your Apache to initialize the RAM disk automatically on rebooting. If you want your Apache to go faster than the speed of light, note that Mod_gzip allows you to serve up precompressed files. This saves the dynamic compression process, thus reducing the interval between the request and the return data by more than half. If you add the following entry to “httpd .conf”
# All files <60 Kb can be # compressed in memory and need # not be transferred to the swap # directory mod_gzip_maximum_inmem_size 60000 # Files to be compressed? # HTML: mod_gzip_item_include file \.htm$ mod_gzip_item_include file U \.html$ # Text: mod_gzip_item_include mime text/.* # Scripts: mod_gzip_item_include file \.pl$ mod_gzip_item_include file U \.php$ mod_gzip_item_include handler U ^cgi-script$
the web server will check for a precompressed version of the file to be served up (with the “.gz” suffix) and return this to the client. If there is none, the file is compressed on the fly and returned. Listing 2 should help you create a compressed version of any files with a specific suffix (for example, “.html” or “.txt”) in a specific directory and its subdirectories. If you require additional information on Mod_gzip, you might like to visit Michael Schröpl’s site [2]. Michael has his own project [3] running – a content encoding solution based entirely on scripts, that will be of interest to readers with FTP only access to their web servers who are unable to install Mod_gzip themselves. Anyone who would like to feel the difference that compression can make to a download should also try out the Hyperspace website [4]. ■
# And finally a logfile LogFormat "%h %l %u %t \"%V U %r\" %&gt;s %b U mod_gzip: %{mod_§§ gzip_result}n In:%{mod_gzip_input_size}n U Out:%{mod_gzip_output_size}n:% {mod_gzip_compression_ratio}U npct." common_with_mod_gzip_info2 CustomLog /var/log/httpd/mod_U gzip common_with_mod_gzip_info
www.linux-magazine.com
THE AUTHOR
COVER STORY
Ulrich Keil studies Computer Science in Mannheim, and works part-time as a system administrator for 9 Net Avenue. In his leisure time Ulrich works as a volunteer emergency medical technician, and still finds time to take care of his personal Sparc station 10, that is still faithfully serving up his homepage at http://www.der-keiler.de.
Apache 2
COVER STORY
New and improved – Apache 2
Rules of Succession A
fter two and a half years of development, the Apache Software Foundation [1] finally declared version 2.0.35 of the Apache web server [2] “fit for public use” in April [3]. When this issue went to print, 2.0.40 was the current version. But it seems that many administrators are still not prepared to risk updating. For those of you still wavering, this article will point out some of the advantages the new web server offers, and those of you who have decided to go for the new version will certainly be interested in avoiding the traps.
Whether they start now or in a few months, administrators will soon need to think about making the jump from Apache Server 1.3 to version 2. This article discusses important new features and provides decision making guidelines. BY THOMAS GRAHAMMER
What’s new?
John C.H. Grabill, www.visipix.com
A whole bunch of new features in Apache 2.0 immediately catch the eye: • Build system: The Build system now uses autoconf and libtool. The installation follows the usual pattern after expanding the archive: ./configure prefix=Prefix, without the “prefix” option the target path is “/usr/local/apache2”. This step is followed by “make” and “make install”. You will need an ANSI C compiler, such as GCC; Perl version 5.003 or better is optional. You can either include the modules you require in the executable like in Apache 1.3 (“--enable-module”) or load them as Dynamic Shared Objects (“--enableU module=shared”) at runtime. The new Apache Extension Tool “apxs” is used for creating DSOs. • Configuration: The administrator modifies the “<Prefix>/conf/ httpd.U conf” file to configure the program. This file simplifies many of the previously confusing configuration instructions or removes them entirely, as is the case for the “port” and “BindAddress” instructions. Now, you only need the “listen” instruction to set IP addresses and port numbers. The server name and the port number, used for redirection and
recognizing virtual servers, can be configured using the “ServerName” instruction sometime in future. • IPv6: Apache will use IPv6 Listening Sockets on systems where the portable runtime library (see the section on APRs) supports IPv6. Additionally, the configuration instructions “Listen”, “NameVirtualHost” and “VirtualHost” can handle IPv6 addresses, for example “Listen [fe80::1]:8080”.
• Modules: Mod_ssl is now officially part of the Apache package. Mod_proxy has been mostly reprogrammed and several functions have been placed in a number of Mod_*cache modules. The Mod_deflate compressor is new and may replace Mod_gzip (see the article on page 26). • Filters: Apache modules can now be used to filter ingoing and outgoing data streams. You can thus filter CGI
www.linux-magazine.com
October 2002
29
COVER STORY
Apache 2
script output using the “INCLUDES” filter contained in Mod_include, which allows you to execute server side includes. • Multilingual: Apache stores error messages intended fro client browsers in multilingual SSI documents. The administrator can modify them to reflect corporate design. • Multi-protocol support: Apache 2.0 provides a platform capable of multiprotocol support. (The documentation for this feature is unfortunately extremely thin and prevented us from ascertaining the practical advantages this feature may offer.)
Under the Hood: Yesterday and Today Apache supported only the Unix operating system up to version 1.2. In contrast to Microsoft Windows, Unix operating systems are capable of copying processes (so called forking). Apache thus ran as a preforking server: When launched, the parent process creates a number of instances of its own process – as defined in the configuration file – and these process listen for HTTP requests. If the number of requests exceeds the number of processes available to receive them, additional instances of the original process are launched. Apache version 1.3 was ported to Windows with a great deal of effort (and to Netware 5 and IBMs Transaction Processing Facility). To do so, the developers had to completely re-write the process engine. The so called thread version of Apache 1.3 is required for Windows, for Unix/Linux you can use the process variant. To facilitate porting Apache 2.0 the developers abstracted the platform specific code segments from the remaining Apache code and placed it in the APR and MPP sources (see below). This kind of modularity also facilitates platform specific optimization.
The Apache Portable Runtime (APR) library introduced to replace Posix was programmed by Apache to place an abstraction layer between the operating system and Apache 2. The API provided by APR contains the basic functionality of a virtual operating system, including file and network I/O, memory management, as well as thread and process management. By preference the APR will always use native calls to the operating system. Additionally, the APR methods emulate the former Posix methods to facilitate porting older code to APR. The achieved goal of APR Version 1.0 was to provide those functions required
Figure 1: Useful information on upgrading to 2.0
The Apache Portable Runtime APR Apache 2 no longer (directly) uses Posix interfaces in contrast to previous Apache versions. Poorly implemented or slow Posix libraries or emulations meant that the server did not perform well on non Unix operating systems.
30
October 2002
Figure 2: The Apache MPM site
www.linux-magazine.com
for Apache 2.0. There are plans to develop APR independently of Apache as a basis for platform independent program development and to make it available to interested programmers.
Multiprocessing Modules Apache 2 uses special modules to abstract the code used to manage processes in threads in Version 1.3. It is these Multi Processing Modules’ (MPMs) task to pass incoming HTTP requests to simple execution units, which will in turn process the request. The MPM in use specifies whether to use either processes or threads. This kind of modularization provides for a clearly
Apache 2
structured Apache 2 source code and offers several other advantages. The Apache developers expressly allow MPMs to use operating system specific code. MPMs of this type can only be used on one operating system, but their performance will be far superior – and this is particularly evident on non-Unix operating systems. Bill Stoddard, an Apache developer, achieved a performance boost of 50 percent for static websites on Windows.
with incoming HTTP requests. Webmasters can therefore choose the variant best suited to their applications by linking the MPM in to the Apache binary (see also “Ready-Made MPMs in Apache 2.0”). Threads use fewer resources than processes on Unix operating systems – and on Linux. On the other hand, a process based approach does provide better stability, as a faulty thread can bring down its parent process.
MPMs on Unix
Update or Install from Scratch?
There are several MPMs available for Unix. Each one of them has a different approach to how the web server deals
Since there have been some serious changes to the architecture of the Apache web server, you will definitely
Ready-Made MPMs in Apache 2.0 Prefork (default for Unix platforms) This MPM implements typical Apache 1.3 behavior in Apache 2.0. In this case, a parent process will create a pool of child processes for the incoming HTTP requests.The options “MinSpareServers” and “MaxSpareServers”are used to set the lower and upper limits for the child process pool. If the number of free processes drops below the number defined in “MinSpareServers”, then new processes are launched, if the number exceeds “MaxSpareServers”, Apache will remove processes from memory. Since each process will handle only one request, an error in one process will drop only one connection to the server.This is a great advantage if you work with dynamically generated pages. Threaded This MPM is similar to Prefork, the main difference being that every Apache 2 process can run multiple threads.The configuration option “ThreadsPerChild”is used to specify how many. If Threaded is used, the httpd keeps count of the number of unused threads remaining.
COVER STORY
want to install from scratch [5], rather than attempting an update. After trying both approaches, I discovered that version 2.0 of the Apache attempts to reuse the original 1.3 modules if you perform an update. Due to changes in the base technology 1.3 modules cannot be used without some modifications. You will also be unable to re-use the configuration files – particularly “httpd.conf” – as many configuration instructions have been simplified or just removed.
Is it Worth Changing? Changing versions means a lot of work – you will need to install your Apache from scratch, including completely re-configuring all the settings. The standard version 1.3 modules are available in Apache 2.0 (see figure 1), but existing third party modules will not be available for the time being. Administrators will have to decide for themselves whether upgrading is worthwhile. If you really use the features we just discussed, such as the Perchild MPM, you should probably go for the upgrade, despite the work involved. If the features merely appeal to you, or if you think they are a waste of time, you might prefer to stick the old adage: Never change a running Apache. ■
The “MinSpareThreads”and “MaxSpareThreads”directives stipulate the number of unused threads that can occur before creating new processes or removing processes from memory.
INFO
This MPM thus uses multiple processes and threads – a genuine enhancement in comparison to Apache 1.3. Dexter The Dexter MPM also uses both processes and threads. In contrast to Threaded, the number of processes is clearly defined, whereas the number of threads per process depends on the current server load.The configuration statement “NumServers”specifies the number of processes to be created on launching the web server, the number of threads is defined in “StartThreads”. “MinSpareThreads”and “MaxSpareThreads”define how the number of threads will be adjusted to reflect changing loads on the server.
[1] Apache Software Foundation: http://www.apache.org [2] Apache Project: http://httpd.apache.org [3] Release List: http://www.apacheweek.U com/features/ap2#rh [4] List of Apache 2 Modules: http://httpd.apache.org/docs-2.0/mod [5] Installation: http://httpd.apache.org/U docs-2.0/install.html
Perchild is based on the Dexter MPM, but adds an additional function that web hosters will appreciate. Just like Dexter, Perchild also uses a set number of processes that spawn threads. In order to run multiple virtual hosts with different privileges, Perchild assigns user and group IDs to the processes. The “ChildPerUserID”directive specifies how many processes will run under a specific user ID. While the “AssignUserID”statement within the context of a “VirtualHost”assigns a user ID to a specific process. Modules for Other Operating Systems There are additional MPMs for Windows, OS/2 and BeOS, which are automatically linked for the respective platforms by the build system.
THE AUTHOR
Perchild
Thomas Grahammer has a university degree in Computer Science and is in charge of software development at a Munich based software company. He is also a freelance software developer, with in-depth knowledge of databases and Apache, and holds professional qualifications from SuSE and PHP
www.linux-magazine.com
October 2002
31
Apache Projects
COVER STORY
Structural and Development Model of the Apache Projects
Chiefs and Indians W
Web Server as a Name Giver There are subprojects, which are often divided into sub-subprojects – you can see the effect of modularization even here. The most important Java and XML projects of the Open Source Community are now Apache projects, as is the popular PHP web language. In the case
Perl
PHP
Jakarta XML HTTPD TCL APR
Board of Directors
Officers
Apache Software Foundation
Apache-ConCommittee: 5 ASF Members
Figure 1: Apache’s Structure. The individual subprojects are mostly autonomous, the Jakarta Java project is the largest by numbers
32
October 2002
Apache is more than just a web server. The Apache Software Foundation (ASF) provides a home for some of the most significant developments in the world of free Java and XML software, and even PHP. But what makes the ASF so attractive to developers? BY ULRICH WOLF
John C. H. Grabill, www.visipix.com
hen IBM was negotiating a contract with the folk from Apache way back in 1998, since IBM wanted to integrate the web server in their Websphere product, the enterprise’s attorneys were more than dumbfounded. The other party to the contract was not an organization. When asked what kind of organizational structure Apache had, the answer was “It’s a web site.”. “Did I get that right – we are negotiating a contract with a web site?”. But the period of creative chaos is a thing of the past at Apache. The former NCSA server patch (A Patchy Server), as one of the largest and best organized free software projects, has become a kind of role model with the Apache Software Foundation (ASF) deriving partly from the experience with IBM. The Foundation’s task is to support projects running under Apache’s large umbrella both financially and organizationally. The web server itself, which is simply referred to as the HTTPD project in Apache circles is just one small, albeit important, part.
of Jakarta, the Apache Java project, the number of subprojects has risen to 23 and include heavyweights such as the Tomcat Application Server. A total of more than 160 developers regularly contribute code to the Jakarta project. This puts Jakarta at the top of the list of Apache projects (Figure 1). Apache XML has 125 main developers assigned to no less than 18 subprojects ranging from A for AxKit, a web publishing tool, to X for the XML parser Xerces. Even Soap, the Simple Object Access Protocol that is becoming increasingly popular for web services is an Apache XML subproject. The third largest project by numbers is the web server itself, that is the HTTPD project with 65 active developers. Even the web programming language PHP with all its subbranches is a part of the Apache project. There are approximately 15 core developers at present, but this does not include those who regularly contribute to associated projects such as documentation or add-ins and applications. These developers have been swapped out to the Pear project (PHP Extension and Application Repository).
www.linux-magazine.com
Normally with an organization the size and complexity of the Apache project, you would automatically assume that a hierarchy would be in place with command chains and formalized decision taking processes. However, there is little evidence of this in the case of the Apache project. Instead there are mandatory rules.
More Work, More Say The Apache Software Foundation describes itself as a meritocracy, that is a rule of the deserving. People who put in a lot of work, also have a lot to say. And this is the principle used for admission to the Foundation or even its Board of Directors. The Board works just like any other American corporation, electing officers for management roles, a Chairman, a President, a Treasurer, and three Vice Presidents. Each subproject has its own structure and rules that allow the developers involved to reach decisions. There is no move towards harmonization. In many cases, as ASF member Lars Eilebrecht reports, a lazy or rough consensus model will eventually appear. If a suggestion does not meet with disapproval, you can consider it approved.
In the case of the HTTPD project anybody can veto a patch, but the objections have to be well-founded. A typical case would be discovering an error in a patch that could give rise to further issues. Cases like this are normally resolved quickly, although lengthy discussions (which are resolved by a majority decision) tend to ensue in case of general design decisions. As Lars Eilebrecht reports, the ASF does not receive any earnings from partnerships with other enterprises. Additionally, the ASF will never appear as a partner of an enterprise. However, there are so-called Memories of Understanding, for example between the Apache Software Foundation and the manufacturer Sun Microsystems.
Apache and Money The ASF has two sources of income at present: Firstly by donations (the ASF can issue donation receipts as a nonprofit organization). Secondly additional earnings come from a share in profits on Apache Conferences. Although membership is restricted to individuals only and not permitted for companies, there is obviously some collaboration with the IT industry. Five members work for IBM, and there are two members from Sun. Brian Behlendorf’s startup company, Covalent Technologies, an enterprise whose core business is Apache based products and support, is particularly prominent. The Apache project is living proof that the world of Open Source is not only a competition for workable solutions and good code, but also all about successful development models. Developers seem to regard the ASF’s approach as closer to the mark than many others, and this has made the ASF a magnet for projects.
Competition Between Development Models It may have something to do with the largely autonomous approach of the subprojects or the ideological freedom of the decision makers, and it certainly has something to do with the attractiveness of the web server as a role model for free software rubbing off on to other Apache projects. But the ASF tend to turn down additional projects due to the increased coordination demands placed on them. The relatively friction-free cooperation with IBM or SUN is probably due to the Apache licensing form, which is extremely easy on the IT industry and allows integration into proprietary software, in contrast to GPL, for example. We will probably see another demonstration of the popularity of the Apache project at the Apache Con starting 19 Nov in Las Vegas. Last year the congress was cancelled as the organizer, Camelot, went bankrupt. The new organizer – Security Travel – also host the Black Hat and Def Con security conferences, and is thus conversant with hosting similar events. Five members of the Apache Software Foundation will be taking care of the technical side of the congress. ■
NEW SuSE Linux 8.1 Putting you in the winners’ circle with open standards For beginners: SuSE Linux Personal 8.1 • Free MS-Office compatible office suites • Secure Internet and eMail communication • Easy-to-install desktop solutions • Extensive multi-media support • Graphics manipulation tools for digital cameras, scanners etc.
For professionals: SuSE Linux Professional 8.1 • Complete small office solution • All you need to run your office network • Configurable security with SuSE Firewall 2 • Additional secure file systems • Numerous development environments and programming tools
SuSE Linux is celebrating its 10 year anniversary! We owe our success to you, thus we would like to thank you for your loyalty. For further information visit our websites: www.suse.com www.suse.co.uk www.suse.de/en/ SuSE Linux AG Deutschherrnstr. 15–19 90429 Nürnberg Germany
REVIEWS
Mail Servers
W
hat’s wrong with having clients taking mail? Mail handling should not be the responsibility of a desktop application. Letting Kmail or other client you use take your mail down from an ISP is missing the point of running a Linux system. If you are working with other users on a network, or even a single workstation with multiple login’s mail can get lost or misplaced into the wrong account. It is even more impractical to use only desktop clients if you are looking to collect and distribute mail for a group of people, and more so again should you be thinking of doing this through your own domain name and not that of your ISP. You will find advantages also if you get lots of daily emails, because you can use better filtering facilities to control how your email is presented and prioritized, and you will have more control over how you can maintain backups and mergers. You will be able to control spam better, using applications like SpamAssassin. If you are using a stand alone system as a workstation you won’t need to worry about taking on the responsibility of running your own email server, but if you are responsible, you may very well want to. There is nothing more annoying and frustrating than finding that you are being denied access to the most fundamental internet services, your email, just because your ISP hasn’t made enough provision when their email server fails you. If you run your own mail servers then you have access to an alternative system. Responsibility may not be your only issue, you may also find yourself wishing for better features, not present in your email client. It may not have the filtering support you need, for instance. In this article we will give you some highlights and guidance to how you might improve the email provision on your machine.
How email gets from A to B If you are going to take a more hands on approach to how email is handled on your machine, you will need to start out with an understanding of the processes that it goes through to get to and from your machine. Unfortunately, for simplicity’s sake, there is no one chain of
34
October 2002
Better e-mail management
Mail Servers If you value your email, you will want to have full and total control over it. Relying, totally, on the email provisions of your ISP may not be enough. Here are the details that will put you on the right road to setting up your own e-mail servers. BY COLIN MURPHY
processes to follow, it depends very much on how much control you are willing to pay for and who is providing you with your email addresses. At the beginning of the process if you are sending an email, or at the end if you are receiving, is the mail user agent (MTA). This is the client side application you use to read and write your emails
www.linux-magazine.com
with. See the Box: E-mail Clients, for a brief list. The important thing about the MUA, from the mail servers perspective is not to handle the formatting of the email text or helping with its composition, or anything like that. Its fundamental roles are: • To make sure that the email is in its own discrete block, so that it is still
Mail Servers
REVIEWS
Figure 1, An example of WebMin being used to configure and administer
Figure 2: Newsgroups are always a useful source of information to ensure
Sendmail
your Sendmail system is working
recognized as being your email once it has come out of the other end of the system, having been lumped together with lots of other emails, as happens in the next stage, with the help of the Mail Transport Agents (MTA). • The creation of the header part of the email message, which contains the useful information, such as, where the email is going to and who to advise, should it fail in its mission. Now that the email message has been composed and in its own discrete bundle, the act of starting to serve the email messages can begin, with the MUA passing this bundle onto the MTA. The MTA routes messages between machines. There are many different MTA’s available for Linux, and we will give a round up of the four most popular, Sendmail, Postfix, Exim and Qmail. The MTA looks at the details in the header section of the email and decides how best to get the message to the recipient. If it is local mail, for local people it will pass it to the delivery agent on that local machine. Should the mail be for some far flung place, then it will try to pass the email on to the designated server as shown in the MX record of the recipients DNS entry. It can also take some spam blocking actions as part of its work, confirming the authenticity of the email header and the route that it took If it is not yet on a server in the local domain, another lookup is done to pass it on to the next server in line. If this server is in the local domain of the recipient, the the message is almost
delivered and so passed on to the local Delivery Agent. Sometimes this local Delivery Agent role is handled by the MTA, but there are other options. Procmail is the Delivery Agent of choice for Linux users at the moment. This is where the receiving side gets some control as to what happens to the email from here, both as system administrator and, ultimately, the email’s recipient. In its simplest form the email will just add the email received to the mail spool that is to be used later by an MUA. Many do more with it though, getting their mail grouped and filtered and even have applications run on receipt of certain emails.
Popular MTA’s There are many different types of mail transfer agent available for Linux. Here are four of the most popular, open source, MTA’s in use today.
Sendmail Being the first MTA, having a legacy that reaches as far back as 1981, Sendmail is considered to be the de facto standard for mail transport applications. Sendmail powers four times more internet mail domains world-wide than its largest competitor. Eric Allman wrote Sendmail right at the beginning of what we now know as the Internet when it allowed the routing of email between ArpaNet and the University of California at Berkley. Being Open Source, momentum soon increased its list of developers, allowing it to take, and keep, top place. Being the front runner also makes it
first in line for criticism, and these usually revolve around complexity of configuration and security. While I’m sure that security was an issue back when Sendmail was first developed, surely they could not have conceived how the Internet is being used today and so, must always be playing catch up with any new security threats that come along. Patches come through very quickly, covering over any security flaws that are found. Because of this, some people in the industry consider Sendmail to have a low security rating. The alterna-
E-mail clients There are many to choose from, from the delights of Kmail and Evolution, the default MUA’s for KDE and Gnome users respectively, and include the email clients that come as part of a bigger package such as in Mozilla or Netscape, for those who prefer to have their email and web browsing functionality lumped together in one application. At the moment, the most popular to try out, of the graphical email clients, is Sylpheed, with its claims of speed and small memory footprint. For those that do not want to, or cannot afford to have the impact on memory of running a graphical client, there are many console based email clients such as pine, elm and the ever popular mutt. At the very bottom, some might say top, of the tree, stands mail, father to them all. The choice of which you choose really depends on how much control you want to take over its configuration and who you want to have that control.The more graphical, the less control the end user has.
www.linux-magazine.com
October 2002
35
REVIEWS
Mail Servers
tive view is that, because you constantly have to work closely with Sendmail, constantly updating and adding security patches you are much closer to the problem of security, more aware of the problems, with a better understanding of the risks and are more secure as a result. Installation of Sendmail is very easy, and it is most likely to be provided as the default email server for most distributions, Debian being the most notable exception, but is easy to get hold of, thanks to apt-get. Configuration is a different matter and many consider this to be Sendmail’s greatest weakness. Sendmail’s configuration files are designed to be easy for Sendmail to deal with, leaving the configurator to suffer in its complexity. Thankfully, some efforts have been made to make this much easier and most users will use a set of macros called m4. Sendmail offers immediate support for the IPv6 protocol. Much of the challenge of configuration has been removed by applications like WebMin, figure 1, which allows you to configure and administer Sendmail, amongst many other applications, through a web interface, remotely if needs be. With its long legacy, Sendmail does have backing and a range of professional and enterprise products and services
Exim Exim has been developed at Cambridge University by Philip Hazel since 1995 and released under the GPL. It is the default mail server included with the Debian distribution. Taking what he had learnt from the Smail project, one of the
GLOSSARY IPv6:The internet works because of IP addresses, those dotted quads you see, such as 192.168.0.1. Unfortunately, the world has run out of addresses like this, which are part of the IPv4 numbering system.The solution is IPv6 which will use an address length of 128 bits as opposed to the 32 bit addresses currently in use. As an example of how many more IP addresses will be available after the introduction of IPv6, if you said that all of the address available with IPv4 was just 1 millimeter long, then the address space available from IPv6 would equal 80 times the diameter of our galactic system.
36
October 2002
Figure 3: Here the anatomy of the Postfix system is laid out for all to see
earlier GNU MTA programs to have been developed, Philip Hazel wanted to keep the same lightweight approach, but add a great deal more functionality. Its interface is that of Sendmail, so it can be used as a drop in replacement for a system that has been installed with Sendmail. The current version 4.05 is a complete rewrite of the previous versions to benefit, as such, from the continual problem of patching code. This has allowed new features to be implemented making the overall structure more modular and the filtering scripts easier to modify. The new Access Control Lists (ACL) handle the way Exim deals with differing SMTP commands. Previously the policy control was by a series of options that could cause some confusion due to their interactions. By using ACLs an administrator can now write his own tests and know which order, and so what interactions occur as they are performed. The new version also does away with the directors which handle local domains by incorporating these into the routers which perviously only handles remote domains. Exim again has native support for the IPv6 protocol. Exim supports both Maildir and mbox file formats, convenient if you are migrating from other systems. Scalability seems to have come naturally to the development of Exim. Even though it was not conceived as a high-performance MTA it has successfully been used on systems which have reported that up to 800,000 messages a day have been dealt with. The large ISP freeserve.co.uk use Exim for their email systems. Exim is not the most secure of servers available. It is written as a single
www.linux-magazine.com
binary which then has to be run, in effect, as root. This opens up the concern that, should a security exploit be found, there is a greater chance of someone gaining unauthorized root access. This is a remote possibility though. Configuration is much easier than Sendmail, especially for those who do not want to give over their lives to keeping the system running. There are some very useful examples of how Exim can be configured to cater for single user systems to very large mail servers, these can be found at http://sysadmin.oreilly. com/news/exim_0701.html Philip Hazel has also written the definitive book about Exim, details of which you will find at the O’Reilly site, from the above link.
Postfix Wieste Venema developed Postfix while he was working for IBM, released in 1998 as the IBM Secure Mailer and then released to the wider community under the IBM Public License as Postfix. Realizing the strengths and popularity of Sendmail, Wieste developed Postfix to maintain compatibility with it as much as possible, while also ensuring Postfix would be faster, more secure and much easier to configure. At the moment, Postfix does not offer inbuilt support for IPv6 but this can be added as a patch. You have access to both Maildir and Mbox email file formats as well as all of the Sendmail file layout, utilizing /var/spool/mail, /etc/aliases, and ~/.forward files, etc. Postfix has multiple layers of defence to protect the local system against security breaches. There is no direct path from the network to the securitysensitive local delivery programs – an
Mail Servers
intruder has to break through several other programs first. Postfix does not even trust the contents of its own queue files, or the contents of its IPC messages. No part of the Postfix program needs to be run as root set-uid. Performance wise, Postfix is very fast, with claims of being up to three times faster than its nearest competitor. Postfix has some very useful Unsolicited Commercial E-mail (UCE) control features, needed in these days of spam. You have control over which hosts can relay messages through your system implementing things such as black lists and DNS lookups. Content filtering is not available, for that, you will need to use a product such as SpamAssassin. The Postfix website contains lots of useful information to get this mailserver up and running, figure 2 shows an example, which is explaining the anatomy of how it works.
Qmail Qmail is the only mail server to offer a cash reward for anyone able to prove they have found a security hole in Qmail. In March 1997, D.J. Bernstein, the author, offered $500 to the first person to publish a verifiable security hole in the current version of Qmail. No one has won so far. The Qmail license does not allow the redistribution of modified qmail source code packages. The upshot of this mean you are unlikely to find it being installed by default in any of the mainstream
distributions. Qmail uses its own method of installation, which, while simple to do, does not sit comfortably with those who like to keep their personal RPM databases accurate. Qmail doesn’t seem to have been updated in some time, the current version 1.03 was released in 1998. It does feature support for IPv6 via a patch, and many other features are available as patches, for which the other Mail Servers have included in their current version, like support for Authenticated SMTP and backend access for LDAP.
The danger of running an open relay As part of the route that an email must take to get from A to B it will need to pass through, or be relayed by, other mail servers. Imagine the case where your local mail system, home.com is sending out a message for someone at away.com. The first hop that your email might take could be from your own local network to the email server of your ISP. From there the email is relayed to mail servers that are nearer the final destination, say, the incoming mail server of the recipients ISP. In this chain of relays there is an obvious reason for the connection between one machine and the next. It used to be the case that mail servers would happily relay messages from one server to another. As part of the design of the internet, where there is no ‘fixed’
Figure 4: The Open Relay Database site whrer you can get help for fighting spam
REVIEWS
route to go from one server to another, only the most convenient route at the time of transmission, a mail server might have found itself with an email following some obscure path. It was quite happy to pass on the email to a server nearer the destination, in order to keep the system running as smoothly as possible. This open, good natured approach to running email throughout the Internet was, of course, going to be abused by people keen to make a fast buck. Someone sending spam could directly address their emails to go to a mail server and, by forging the information in the header of that email, they could be confident that their mail would be passed onto the system for eventual delivery to someone who did not really want to receive it. The mail server was open to pass on any mail, which also meant it was open to abuse. It was an Open Relay. The promiscuous relaying of email was the default mode for sendmail prior to version 8.9, and seeing as it powers the largest number of mail servers, the possibilities for abuse were everywhere. Now is is deemed necessary to have much tighter control over what you allow your mail servers to relay. The need for servers to relay messages is still there, but only for well defined cases, which now need to be configured for. The most common case being that where a mail server is also acting as a gateway to some other network which may not have a permanent connection. If you are going to run your own mail server, you take on the responsibility for making sure it cannot be used as an open relay. The scourge of such relays is enough to prompt the creation of ‘blacklists’, such as the one at http://www.U ordb.org. These ‘blacklists’ can be accessed by mail servers directly while they are handling a piece of email. Should that email have followed a route that includes one of these ‘blacklisted’ mail servers then the email is handled with extreme prejudice, usually being bounced back to the originator, with a note saying why. So, should you be administering the mail server used by others and it becomes ‘blacklisted’, because it has been poorly configured, your users are going to come down on you like a tonne of bricks, wanting to
www.linux-magazine.com
October 2002
37
REVIEWS
Mail Servers
know why their email is not getting out. Best to get it configured right first time. For many though, this is not likely to be a problem, as all of the recent mail servers will come with relaying option turned off.
SpamAssassin UCE, also known as spam, is the scourge of the earth I’m sure you will agree. Direct marketing via email might have been alright if those doing the marketing had the ability to be a little more selective to those that they have decided to spam, with far too big a percentage offering products and services of a most unsavoury nature. Users of small, home networks have to accept the responsibility of stopping UCE being delivered to kith and kin. Administrators must equally take on the responsibility to stop company staff receiving UCE, the deluge of spam can mean a real waste of time and there is the danger that bona fide email will get lost in amongst it all. What tools are available then? The most spoken about at the moment is
SpamAssassin, which works as a mail filter, analysing each mail for common attributes to be found in spam. These attributes include searches for certain well known phrases in both the header and body section of emails, phrases like “MAKE MONEY NOW!!!” and counting the number of times a ‘$‘ is repeated. The more it is repeated, the more likely it is spam. There are also tests done to the header section of an email to add credence to its validity, many spam emails will have some element of the header forged. A ruleset is built up of these attributes. Each element in the ruleset is given a weight of importance, a number of points. These points get totalled as the elements are found in the email. If an email receives enough points it will be deemed worthy of being called spam. You as administrator can fine tune the points per element and the threshold value to suit your needs if necessary. The ruleset that comes as default seems to catch most cases. This is good, but it gets even better if the rulesets are shared. Now, when a
Listing 1: Output of a dnsquery colin@murphy.me.uk:~> dnsquery -n 192.67.202.2 murphy.me.uk ;; ->HEADER<<- opcode: QUERY, status: NOERROR, id: 20266 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 2, ADDITIONAL: 5 ;; murphy.me.uk, type = ANY, class = IN murphy.me.uk. 1D IN A 10.30.167.39 murphy.me.uk. 1D IN MX 10 mx0.domain-reg.co.uk. murphy.me.uk. 1D IN MX 20 mx1.domain-reg.co.uk. murphy.me.uk. 1D IN NS ns.hosteurope.com. murphy.me.uk. 1D IN NS ns2.hosteurope.com. murphy.me.uk. 1D IN SOA ns.hosteurope.com. hostmaster.murphy.me.uk. ( 2002090710 ; serial 8H ; refresh 2H ; retry 1W ; expiry 1D ) ; minimum murphy.me.uk. 1D IN NS ns.hosteurope.com. murphy.me.uk. 1D IN NS ns2.hosteurope.com. mx0.domain-reg.co.uk. 4H IN A 192.67.202.235 mx0.domain-reg.co.uk. 4H IN A 192.67.202.237 mx1.domain-reg.co.uk. 4H IN A 192.67.202.241 ns.hosteurope.com. 13h21m13s IN A 192.67.202.2 ns2.hosteurope.com. 13h21m13s IN A 192.67.203.246
38
October 2002
www.linux-magazine.com
new piece of spam is written which happens to slip through SpamAssassin’s net, a new rule element will be written to block it in the future and propagated out to everyone who is registered to use this shared database. With any luck, most of the people taking advantage of this distributed ruleset will never be inconvenienced by this new piece of spam because the new ruleset will be in place before the message reaches them for the first time. The Razor project, http://razor.sourceU forge.net, helps to provide this distributed database of rulesets which SpamAssassin can then call upon. SpamAssassin will also refer to some of the ‘blacklists’ that are available, like http://www.ordb.org/ and http://mail. abuse.org/. These ‘blacklists’ contain details of Open Relays which are very often used as conduits for sending spam.
Buying domain names and access to MX records If you are going to run your own mail servers, you will want a domain to call your own from which you can send email and be more easily remembered because of it, so people can send you email. You may not value the domain name give to you by your ISP, usually because it is just downright long winded or clumsy – colin@murphy.org is far snappier than colin@murphy.nameofisp.co.uk, for example – ONLY! Getting your own domain is simple enough, just enter the phrase ‘domain registration’ into your favourite search engine to find a domain name registrar and have your credit card handy, to pay for the privilege. Be warned though, you need to take care to make sure you get the amount of control that you want and need. The most important being the control needed to get your email to do what you want. The amount of control you have revolves around how much access you get to changing the details in your DNS entry. This is done via some web front end to your registrar’s administration program. For full control over how your email is handled for this domain you need to be able to amend the MX record in the DNS entry, because you need them to point to your mail server. Listing 1 shows the output from a dnsquery.
Mail Servers
Figure 5. The web based administration tool used by DynDNS.
The dnsquery command allows you to see the detail held in the DNS database about a given domain, in this case murphy.me.uk , selecting the domain name server with the -n switch. Here you can see the the MX records firmly pointing to domain-reg.co.uk, the company used to register the domain. Unfortunately, for me, I discovered that you can not change the MX records for this type of registration, at least for this type of package – the cheapest available! On further research I could not find anyone that openly gave you the option of controlling this record. This means that the registration company will have
The importance of being MX MX records in the DNS entry allow for email traffic being sent to the domain to be sent to some other IP address, the address of the mail servers.There can be more than one entry because you can have more than one server running as a fallback, each with different priority settings. In the example above mx0.domain-reg.co.uk has a priority of 10, making it the first port of call over mx1.domain-reg.co.uk with its priority of 20. Should mx0 not be available for some reason, mail will be sent to mx1 instead. If the MX record is blank then it takes the information in the A record.
control over any email sent to this address. There is nothing wrong with this, and they provide the facility to forward on mail to somewhere else, it’s just that I still have to rely on their server not just my own. Should this functionality be of a greater importance to you, then you will need to make sure that you can change the MX as you want. You will need to ask the domain registrars directly to confirm that they offer this feature. Alternatively, you may find the services of dynamic DNS providers useful.
No fixed abode If you have control over your MX record, then you will want to configure it so that it points to the IP address of the machine you are going to run as a server. But what do you do if you don’t have a fixed IP address because you are using a cable modem or similar. There are Dynamic DNS services which will allow IP addresses to be
REVIEWS
mapped to other addresses, so you can have the IP address of your domain mapped to the IP address of the actual mail server. But the IP address of the mail server is still dynamic – likely to change at the whim and fancy of your service provider, so how does the dynamic DNS service know where to map the address to? This can be taken care of by using one of the small clients that will, periodically, question your server machine about what IP it currently has and update this information automatically to the dynamic DNS service provider. Some dynamic DNS services, like DynDNS offer a free service where you can register one of their subdomain. Now you can have an email address like me@colin.isa.geek.net, or something more serious if it takes your fancy like me@colin.dyndns.org, and have any mail addressed to it sent straight to your mail server. Figure 5 shows you the configuration screen of a newly created host which will map to the specified, and wholly made up, IP address. If you want to use your own domain name and not a subdomain of DynDNS, then you can still do this, but there is a one off charge. This then gives you room in the DynDNS name server for the domain you had previously registered. Now you can expect email sent to your domain to reach you, even though you have a dynamic IP address.
Options The option to control your own email is valuable in its own right, maybe the system you have in place at the moment works well enough. You only want to consider running your own mail server as a means of providing a back-up email service should you find your ISP’s provisions have let you down. There is a certain amount of effort involved, how much depends on your circumstance. But you have the option, at least. ■
INFO [1] Red Hat: http://www.redhat.com [2] Starline Computer: http://www.starline.de/produkte/easyraid/easyraid_x12/easyraid_x12.htm [3] Easy-RAID X12: http://www.phertron.com/products/easyraid_x16/erx16_fc.htm [4] Cluster-Guide: http://www.redhat.com/docs/manuals/advserver/RHLAS-2.1-Manual/U cluster-manager
www.linux-magazine.com
October 2002
39
REVIEWS
Flip2Disk, BRU, Uplink
Product Reviews We take a look at the latest products to hit the office. These are designed to make your life easier and more fun. BY JOHN SOUTHERN
Uplink It is hard to summarize this game without just filling the page with adjectives such as twisted, great, unique, fun, warped, wonderful, dangerous and above all compulsive. Set in the near future you play the part of a high tech
Flip2Disk This is a handy sized portable hard drive. Having a capacity of 40GB it allows you to carry serious data around in a convenient form. Just fitting into a shirt pocket though more easily carried in its little carry case the Flip2Disk has at its heart a 2.5” hard disk. The model supplied was a Fujitsu 40GB. To save on broken drives this is then encased in a “Shoc-Bloc”. The Shoc-Bloc consists of a rubberised protective wrapper holding the drive inside a toughened plastic
Flip2Disk Amacom Technologies Limited 40GB £385 30GB £289
20GB £195 10GB £159
http://www.amacom-tech.com
outer case. It is quoted as being able to survive a 2 meter drop and we can verify a 1 metre drop as we knocked it off the desk onto a tiled floor by accident. The Flip2Disk is connected via a choice of cables ranging from Parallel port to USB 2 and PCMCIA. Software is supplied for other Operating Systems but under Linux via the USB it acts as a SCSI hard drive. The most obvious uses are to be able to carry data to a client’s machine or as a disaster recovery device. In the office we found it useful to move video files about without tying up the network. The flipdisk PCMCIA converter is probably the most useful allowing you to connect easily to a laptop. We were a little disappointed that although buried on Amacom’s website it mentions Linux compatibility it was not quoted on the box. The other drawback was price. A 40GB 2.5” drive can be had for just over £100 so the extra £250 for a rubber and plastic housing does seems quite expensive. ■
computer criminal. Through a series of challenges you learn how to break into computers over networks. How to earn money selling other people’s stolen information. Learn how to hide from all others and how to trust no one. A demo version is available to download. My advice is don’t because you NEED to buy this now. I can see my whole waking life taken up with this game, and I am deliriously happy about it. ■
Uplink Introversion Software £19.99 Bonus disc £5 http://www.introversion.co.uk
BRU-Pro version 2.0 BRU-Pro is aimed at being a robust and easy to use backup solution with native SCSI tape drive support. Installation was via an install script rather than from source or RPM. Although the installation ran smoothly the Administrator’s Guide explains potential problems such as not being able to execute programs from an automounted CD. After installing the tape server, client and control console you then have to install the agent software on any other client systems that you are
40
October 2002
going to backup. This has the option of using network compression and encryption if your processor can handle the loading. Ports through firewalls are in the range 6661 to 6664. The GUI started with xbrup is password protected. You can also run the software from a console. The system will scan a range of IP addresses to locate clients which can be on other OSes if needed with the exception of Novell Netware. Capable of full or incremental backups the GUI was simple to
www.linux-magazine.com
operate and control. Jobs could be batched and scheduled with ease. Autobalancing is used on multiple drive systems where the least utilized drive is chosen for the next operation. Restore is simple giving the options of overwriting or not and to which destination. Overall a well thought out product, ideal for the small business. ■
BRU-Pro 2.0 Tolis Group Inc. Server plus three clients US $999 Additional clients US $99 Personal Edition US $59 http://www.tolisgroup.com/bru-pro3.html
Books
REVIEWS
Book Reviews If you are looking to set up a professional system, in need of a reference manual or a kernel hacker – we have the book for you. BY ALISON DAVIES
Customizing and Upgrading Linux The book aims at being a quick reference for you to customize and perform a Linux upgrade on a network system. This is the second edition with new information on the 2.4 kernel and an expanded RAID SCSI section. It is based around Red Hat 7.1 Server installation but older 6.2 is covered as many datacenters still run with this.
The style of the book is an easy to follow text that does not patronize the reader. This is due to the McKinnons’ previous experience as professional IT trainers. At first glance the book might seem a rewrite of so many installation guides that have gone before. What is different is the depth of information with which each topic is covered. An example would be the notes about placing a swap partition onto another drive so it does not interfere with production read and writes of a data
drive or the explanation of video interlacing for monitors. Printers and NFS installs are covered, as well as how to optimise or build from a source tree a new kernel. Overall this answers all the questions about your system where in the past you may have just installed with the default options. ■
The largest section of the book deals with device drivers and explains concepts from polling to dynamic drivers. There is a working example of a speaker driver within the section. After covering network implementation and module debugging the final part covers multiprocessing and atomic operations. A third of the book is taken up by appendices for procedure calls.
The CD which accompanied the book was not mentioned or referred to in the text. On examination this held the 2.4.4 kernel along with a host of documents from the Linux Documentation Project. On the whole, if you want information on the kernel then this book will become a well thumbed friend. ■
L McKinnon, A McKinnon Wiley Computer Publishing ISBN 0-471-20885-X £29.95
Linux Kernel Programming This third edition has been revised to focus on the 2.4 kernel. The book aims at giving people information on how the Linux kernel works. It is the leading reference for those wishing to write modules or code for the kernel. Starting with compiling the kernel the text starts describing the different data structures and algorithms. By chapter four we are deep into the memory management system covering block device caching and paging. The file system chapter covers only ext2 but goes into detail explaining inode operations and structures.
Beck, Bohme, Dziadzka, Kunitz, Magnus, Schroter & Verworner Addison-Wesley ISBN 0-201-71975-4 £34.95
Linux Administration Handbook This book takes a practical approach to administrative needs of a Linux system. Working from the ground up it informs the reader of the fundamental basics for the various parts of a system – like system initialisation or managing devices – before going into comprehensive detail on how to administer those systems. Because Linux comes in different distributions,
each of which can be configured in unique ways, books have, usually, had to decide on which flavour to follow. This book deals with Red Hat, SuSE and Debian. The book runs to 890 pages, 18 of which make a full index. 29 chapters break the whole subject up into manageable chunks and stand on their own as sources of information to be plundered. These chapters are broken in 3 sections, ‘Basics’, ‘Networking’ – which takes up more than half the book
– and ‘Bunch O’ Stuff’ that includes details on running a Linux system that ‘Co-operates with Windows’ and ‘Policies and Politics’ of using Linux. A brief set of exercises follow each chapter, so the reader can confirm they are learning the subject, but thanks to the complete index, this will also form a useful reference work. ■ Evi Nemeth, Garth Snyder,Trent R. Hein Prentice Hall ISBN 0-13-008466-2 £39.99
www.linux-magazine.com
October 2002
41
KNOW HOW
MP3 Databases
Keeping track of your MP3 files
Losing tracks N
ow you can have access to your music collection beside your computer at all times – so the real CDs can take their prized position in the living room in the racks, by the Hi-Fi. Copying your music CDs to MP3 format gives you a convenient format to play your music on a PC. You have the ability to decide how you want to access your favourite artists and tunes, being able to combine them into collections of your own devising. This is such a useful feature that it is hard to know when to stop, and some people find that they just can’t.
It is so convenient to have your stacks of audio CDs compressed down into a few MP3 discs. Now you can have access to your music collection beside your computer at all times – so the real CDs can take their prized position in the living room in the racks, by the Hi-Fi. But will you be able to find the music you are looking for any more? BY COLIN MURPHY
Too many MP3’s Even though an MP3 version of an album might only take up about 1/10 of the original size, you may find you have more MP3 files than you are prepared to sacrifice hard disk space for. If this is the case, you will probably want to burn some, if not all, of these files to CD. Now you end up with a collection of data CDs beside your machine that you can play at a moment’s notice, which is fine, so long as you do not want to play a particular track of an album. How would you know which MP3 CD has the track you need for that moment, short of writing out hundreds of track listings onto the disc label or case. Luckily there is a way, by keeping a database of the track listings on your computer, after all, that is where the music will be played.
MP3’s include Oggs For the sake of brevity, I am going to lump all music files into the category of MP3 file, even if you have .ogg files, or even some other compressed file system. This is not intended to be a slur on those that prefer Ogg Vorbis, it’s just that most people will be much more familiar with the term MP3.
How to rip and burn a CD In case you are unfamiliar with the process of copying your audio CDs to a
42
October 2002
compressed format, such as MP3, here is a very brief outline of the procedure for you to try. Your audio CD holds the tracks of music in its own special file format, about 74 minutes of playing time per CD. Since the audio CD does not have a filesystem such as a data CD would, you can not just look at its contents with a file browser. To get access to these musical files you will need to call upon the services of a CD ripping utility, of which cdparanoia
www.linux-magazine.com
and cdda2wav are, by far, the most frequently used. These are command line utilities. It is much more friendly for occasional use to call up one of the graphical front-ends, like Grip, see figure 1. Here you can see how grip has scanned your audio CD, and is now showing track listings and details of the album they have come from.
cddb databases This information, in the majority of cases, can be automatically scanned for
MP3 Databases
by looking at one of the cddb databases, or if necessary, typing the details in by hand. It is quite important that this track and album information is recorded at this stage because it will become part of the information tag that describes the MP3 file later on. If you follow through with the default options, and ask grip to rip and encode your audio CD you will find yourself left with a directory containing MP3 versions of the tracks from the CD. You can play these MP3 files with players like XMMS.
Ripping is fine This is fine for the first 10 or so audio CDs that you have ripped, but by now you have taken up the best part of a gigabyte of hard drive space. On a desktop machine, you may not want to give over all of this space, so, you end up copying all of your MP3 files to a data CD, again, using one of the graphical front-ends that are so popular, like CDBakeoven, see figure 2. But this just becomes the thin end of the wedge, now you rip through all of your CDs, you end up with a pile of discs on which you will struggle to find anything in particular, should you go looking for it. What you need is a database of all of those files.
can find to take on this task, but that might also include streaming files and presenting lists of files through web interfaces, so you need to take a bit of care should you go looking for applications on Freshmeat.
Stand alone or use a database? Your options for building your MP3 database fall into two camps, those that stand alone and those that require a database server like MySQL or PostgreSQL. Which one you choose will depend very much on how you want to access your database and what sort of system you are already running. Should you have applications running on your machine that require the services of a full blown database program, or you really intend to parse your MP3 database through a web interface, then you might want to choose something along the lines of the DigitalDJ program, which takes away some of the effort in dealing with the databases. On the other hand, if you have no need or desire for such an ‘in-depth’ option, then you might prefer to take the far simpler route and use a stand alone program such as GTKtalog. Following
KNOW HOW
CDDB and online CD databases The original cddb is a database, usually one that you will look at online, to look up information about your audio CD using the unique disc ID as a key to query the database. Some ripping programs, such as grip, automatically look at a cddb server to get the album and track listing details. Sometimes, if your CD is a very recent release, or is unusual, you may find that the information is not in the database. This then means you will have to enter this information by hand, uploading these new details to the CDDB database for the next person. For more information on these services, see http://www.freedb.org/ or http://www.cddb.com/
on, next is an example of using a stand alone application.
GTKtalog – the standalone approach In the worst case scenario, you will have realized that you really need to have some sort of database to help you manage the pile of MP3 CDs that you have already created. You will want an
Tools to do the job MP3 File Management is a generic term that would cover the utilities that you
Figure 1: Grip is a graphical front-end that will
Figure 2: CDBakeOven being used to copy mp3 files to a CD. The GUI front-end makes the operation
help you ‘rip’ tracks from an audio CD
simple to use
www.linux-magazine.com
October 2002
43
KNOW HOW
MP3 Databases
Figure 3: Here you can see GTKtalog listing details of my MP3 collection
application that is going to be able to take all of the track listing details automatically from your CDs to build your file. GTKtalog will scan the MP3 discs that you have made, automatically adding the details to a hierarchical database. The procedure is very simple, pop the CD into the drive and hit the ‘Add CD’ button from the main menu. GTKtalog will scan the disc for all of the information and add what it finds to its database. There is no need to mount the disc, and GTKtalog will even spit the disc out when it has finished scanning the files and adding to the database. This saves time as now you can be ready to pop in the next disc.
Label wise If you have been wise enough to add labels to your data CD when you created it, in effect, giving the CD a title, then GTKtalog will use that title as part of its reference for that disc. Because my music collection has no form or reason, I just title my MP3 CDs numerically, as you can see in figure 3. You may have your music collection in some sensible order, which will mean you can file your MP3 discs in an equally sensible order. If you can’t do this, it doesn’t really matter, so long as you can identify an individual
Figure 4: Amending the disc title in GTKtalog
44
October 2002
disc from the complete disc set. If you have not bothered to burn your discs with labels included, then you will need to adjust the disc Figure 5: Using the search facilities in GTKtalog to look for all of my ‘old’ music title by hand, as are in the habit of giving each digital you can see in figure 4. photograph a sensible file name. The Sensible structure downside is that GTKtalog is not looking at the MP3 tag data that was looked up You also get the chance to search for from the CDDB database and saved items in this database. Figure 5 shows when you created the MP3 file. you one of the search screens. In this Luckily, when programs like Grip save example you can see all of the examples their MP3 files, they put them in a where ‘old’ appears in either the track or sensible structure which usually follows album title or in the artists name. a particular form, for example Now to own up. GTKtalog was not artists_name/album_name/tracks. This written to specifically look after MP3 is enough information on its own to help files. All it is doing is saving a you locate tracks with GTKtalog. hierarchical tree based on directory and Should you have never played with file names found on a CD. The plus side databases like MySQL or PostgreSQL, of this is that you can make a catalogue then taking this route to track location of any CDs, possibly photo CDs, if you
Figure 6: The MP3 Database project could have much promise, should someone take on the challenge
www.linux-magazine.com
MP3 Databases
Figure 7: This GUI in DigitalDJ allows you to initialize the MySQL database
nirvana will lead you to a very steep learning curve. None of the packages looked at offered any guidance to getting the database side up and running, all knowledge on this part of the process was assumed by the developers to be already gained by the user. This is a shame, as not everyone has the need to use databases at all, many will not have them installed, even though both MySQL and PostgreSQL come supplied with all of the main distributions.
Extra features There will be some extra overhead on your machine, seeing as you will also have to run the database server to access your store of MP3 information and there is a learning curve to follow in actually getting the server up and running in the first place, but this might be worth it for the extra facility. Many of these front-ends are no more than scripts to take the tag data from the MP3 files, posting it to one of the databases, usually MySQL. There are some exceptions that do offer a more unified approach like mp3db which you can see in Figure 6. Unfortunately ‘The MP3 Database’ project has been discontinued, maybe someone reading might pick it up from
Figure 8: Selecting from your DigitalDJ database
where Benny Mueller left off, details can be found at http://mp3-database.sourceU forge.net/.
Ah, just starting out on the gentle slopes of that learning curve.
Exceptions to the rule
You can look up, select and play your required tracks via the ‘Query’ tab. See Figure 8. DigitalDJ does not offer you an extensive array of searching tools, you really are limited to finding what you are looking for by selecting artist or genre types. Once you have settled on a track, you can add it to a playlist which you can save for later use. Not much use for listening to music on the fly, it is most likely that some of the tracks will be spread about on other disks, all that disk swapping does not make for relaxing listening. A much better use would be to compile another MP3 disk with these track for your playlist. This way you would have convenient access to the music that best suits your mood. GMMusic needs PostgreSQL to provide the back-end database and calls upon having both the Perl and PostgreSQL-perl modules to be loaded. Although not specifically intended for cataloguing MP3 files, this program also has the facility to store details of your normal audio CDs, as well as LPs, Audio and video tapes, and even Minidisc. Another interesting feature is the statistics page which not only tells you how big your music collection is, but breaks this down into various categories like media type and date of recordings, going as far as making an estimate on how much your audio collection is worth based on media prices. Quite how you value your MP3 collection could be a matter of debate for some. ■
Other notable exceptions are DigitalDJ and GMMusic. DigitalDJ compliments its sister product, the ripper and encoder Grip, and is designed to use MySQL as its back-end database. Grip takes the responsibility of writing to your MP3 database with just the press of a button. This will only happen if you have initialized it first in DigitalDJ. Figure 7 shows you the GUI to achieve this, it’s quite straightforward, assuming you already know that the password for the root account of a freshly installed MySQL server is [blank] except that the root account could well be a different name like ‘mysql’ if you are using SuSE, except in the situations where it not, etc.
INFO Grip & DigitalDJ: http://www.nostatic.org/ CD Bakeoven: http://cdbakeoven.U sourceforge.net/ MySQL: http://www.mysql.com/ PostgreSQL: http://www.postgresql.org/ GMMusic: http://gmmusic.sourceforge.net/
KNOW HOW
Figure 9: GMMusic allows for some detailed statistical reckoning
Quick find
www.linux-magazine.com
October 2002
45
KNOW HOW
photopc
G
UI tools allow you easy access via preview functions, but if you regularly produce a large number of picures that you will be processing on your computer, you will appreciate a tool that you can use for shell scripting – a tool such as photopc.
Installing the Source Code The subscription CD in this issue includes the files photopc_3.05.tar.gz [1] and photopc-3.05J23.tar.gz (USB support) [2] in the LinuxUser/photopc/ directory. You should install only one of these packages, depending on whether you will be attaching your camera to the USB port. After mounting the CD, ensure that you are the superuser, root, and follow these steps:
photopc
Picture Mining There are lots of GUI tools available for accessing pictures stored on digital cameras, but we are going to take a look at a tool for the command line, photopc, which is useful for automating tasks in a scripted environment. BY HEIKE JURZIK
asteroid:~# cd /usr/local/src/U asteroid:/usr/local/src# tar U xzvf /cdrom/LinuxUser/photopc/U photopcXY.tar.gz
This creates a new directory called photopcXY. Now change to the directory (cd photopcXY) and type ./configure. If everything works out correctly, you should see the following: creating ./config.status creating Makefile creating config.h config.h is unchanged creating dos/version.h dos/version.h is unchanged creating win32/version.h win32/version.h is unchanged
You can then go on to complete the final two steps: asteroid:/usr/local/src/U photopcXY# make [...] asteroid:/usr/local/src/U
KNOW HOW Although GUIs such as KDE or GNOME are useful for various tasks, if you intend to get the most out of your Linux machine, you will need to revert to the good old command line from time to time. Apart from that, you will probably be confronted with various scenarios where some working knowledge will be extremely useful in finding your way through the command line jungle.
46
October 2002
photopcXY# make install [...]
If everything worked out, and no error messages were displayed, you will find that the program has been installed to /usr/local/bin.
Before You Start Your camera will either be attached to your USB port via a USB lead, or to a serial port using a serial lead – this depends on your computer and the type of camera you are using. If the camera is attached to a serial port, you will need to access the port explicitly, using the -l flag (that is a lower-case “l” as in “Lima”) and the device name, each time you
www.linux-magazine.com
launch the program. To simplify this, you can create a symbolic link for the device. If the camera is attached to the first serial port, ensure that you are the superuser, root, and then enter the following command: asteroid:~# ln -s U /dev/ttyS0 /dev/photopc
The superuser, root, can now use the tool without any trouble, but “mere mortal” users will need access privileges before they can access the program as planned. As already mentioned, the symlink /dev/photopc points to the serial interface to which the camera is attached. In order to communicate with
photopc
the camera without root privileges, your users will need write privileges for the device. You can check the access privileges for the interface using the ls -l command (see also Box 1): asteroid:~# ls -l /dev/ttyS0 crw-rw---1 root dialout 4, 64 Jun 30 16:30 /dev/ttyS0
The first character represents the file type – in this case it is a “c” for “character device”. The “r” refers to read privileges and the “w” to write privileges. They could also be followed by an “x”, for executable, i.e. the right to launch the file. The first group of three characters refers to the owner of the file, the next three to the group and the last three represent any other users on the system. For the serial interface in our example this means that root (the owner of the file), and the members of the dialout group, which was created by the Debian Woody distribution for this device, have read and write privileges. (This would still apply if the group had a different name). To check whether you are a member of this group, use your normal user account and type groups: huhn@asteroid:~$ groups users cdrom floppy sudo audio U video dos cdwrite
To add the user huhn to the dialout group, ensure that you are the superuser, root, and edit the /etc/group file. Look for the dialout group in this file and add the user as required. (To add multiple users simply user a comma separated list). The entry in the /etc/group file will thus read: dialout:x:20:huhn
If you use shadow passwords for groups (file /etc/gshadow), you will need to edit this file and add the user to the group: dialout:*::huhn
GLOSSARY Symbolic link: A link to another file that is treated by the application program exactly as the file would be. If you delete the file the symlink points to, any commands using the link will be pointing into empty space. Symlinks are created using the “ln -s”command.
To apply these changes type newgrp dialout while logged in as a normal user. Alternatively, just log on again to enable the new group membership. Depending on your distribution there are varying approaches to working with USB. Some systems may allow you to simply attach your camera to the USB port and power the camera on. If this does not work, you can refer to the approach shown here to access your USB camera via photopc. Make sure that you check your access privileges for /proc/bus/usb first. Then attach your camera to the USB port, fire up the camera and type: asteroid:~# ls -l U /proc/bus/usb/001 total 0 -rw-rw-r-1 root 18 Jul 31 17:19 001 -rw-rw-r-1 root 18 Jul 31 17:19 002
root
U
root
U
Mere mortal users are apparently not allowed to talk to the camera as both devices belong to the root user and group. But changing that requires only a few steps. As previously seen in the example with the serial port, first ensure that you are the superuser, root, and then edit the /etc/group file. Add a new group called usb. The numbers for system groups are usually in the range 0 through 99. User groups follow from 100 upward – although exceptions are possible. Locate an unused number for the group
KNOW HOW
(we used 51 on our test system) and enter the group number with your account name: usb:x:51:huhn
If shadow passwords are used, you will need to add an entry to /etc/gshadow: usb:*::huhn
Now all we need is an entry in /etc/fstab: none /proc/bus/usb usbdevfs U auto,devmode=0664,devgid=U 51 0 0
This modifies the file rights for USB devices to allow members of group 51, that is usb, read and write access. There is also a way to provide a shortcut for launching the program. You would normally need to type the -u parameter explicitly in order to tell photopc to use the Universal Serial Bus. However, the bash command alias can save you a lot of typing: Add the following entry to your .bashrc: alias photopc='photopc -u'
Now call source ~/.bashrc to update the system with the modified file.
Off to Work! Assuming that everything is configured right, contacting the camera should be no problem:
Box 1: Interface names in Linux If you take a look at your /dev directory, you will find a number of entries that appear somewhat cryptic at first sight. Linux uses device files to communicate with hardware devices (such as hard disks, floppy drives, mice, sound cards and so on).The first character is either b (for “block device”) or c (for “character device”), and represents the access mode. Important device names are: */dev/hd* - IDE drives */dev/sd* - SCSI drives */dev/tty* - virtual terminals
*/dev/lp* - parallel ports (printers etc.) */dev/scd* - SCSI CD ROM drives */dev/ttyS* - serial ports
Some of these device entries are represented by links, for example you can access the /dev/cdrom entry on /dev/hdc (a CD ROM drive attached to the second IDE bus) and /dev/mouse on /dev/ttyS1 (the second serial port). The USB device file system is generated dynamically in a similar fashion to the /proc file system, and is normally to be found under /proc/bus/usb. Directories following the “00n”pattern contain the ports for active USB devices.The files devices and drivers contain an overview of the devices currently attached and any drivers assigned to them.The kernel is responsible for creating these directories (provided it can support USB). As the data in /proc/bus/usb/devices is quite extensive, you will probably want to use an X application, such as usbview, to keep track of all the attached USB devices.
www.linux-magazine.com
October 2002
47
KNOW HOW
photopc
huhn@asteroid:~$ photopc query Found usb device id 0x100 by vendor 0x7b4 Found usb camera: Olympus Optical Co., Ltd. C-2100/C3000/ C3040 Camera Starting in folder "\DCIM\U 100OLYMP" Resolution: 7 - SQ1-1280x960U -Normal
Camera time: Wed Jul 31 U 22:22:19 2002 CEST [...]
You can launch photopc with the -h (help) flag set to display a complete overview of the available parameters and command options. You will need to use the less pager to prevent the output simply scrolling off screen:
Listing 1: Output of the list command huhn@asteroid:~$ photopc list No. Size R P Date and Time Filename [...] 45 45 262400 83887040 - Mon Jul 29 23:24:43 2002 CEST P7292878.JPG 46 46 276982 83887040 - Wed Jul 31 16:39:24 2002 CEST P7312879.JPG 47 47 275218 83887040 - Wed Jul 31 18:41:30 2002 CEST P7312880.JPG
Listing 2: photopc script #!/bin/bash # define photopc call type (for USB in this example, use "photopc" forU serial) PHOTOPC="photopc -u" # Create target directory (if not already created) echo "Type the name of the directory:" read mydir mkdir -p $mydir || exit 1 # Count pictures -- uses the last line of output number=`$PHOTOPC count | tail -1` echo "There are $number pictures on the camera." # If a list is required, pipe output to more echo "Would you like a list of pictures?" read answ if [ $answ = "y" ] then $PHOTOPC list | more || exit 1 fi # Selecting pictures images echo "Which pictures would you like to download? U (poss. entries e.g.: 1 or 1-30 or 1,2,10)" read range $PHOTOPC image $range $mydir || exit 1 # Thumbnail selection echo "Would you like thumbnails of the pictures? (y/n)" read answ if [ $answ = "y" ] then $PHOTOPC thumbnail $range $mydir || exit 1 else echo "No thumbnails requested." fi
48
October 2002
www.linux-magazine.com
huhn@asteroid:~$ photopc U -h | less
The count command will count the pictures on the camera. If you need a more precise overview, you can try list instead (Listing 1). The file names shown here might seem a little cryptic – these are the camera’s internal names for the image files. When you download the images photopc uses a default format “MMDD_NNN.jpg” (month, day and number) for storing the images on your hard disk. Launch the image to do so: huhn@asteroid:~$ photopc U image 1 . Found usb device id 0x100 by U vendor 0x7b4 Found usb camera: Olympus U Optical Co., Ltd. C-2100/C3000/U C3040 Camera Starting in folder "\DCIM\U 100OLYMP" 1: 279907 of 279907 taken Fri Jul 05 20:05:44 2002 CEST file "./0705_001.jpg"
This syntax downloads the first image on the camera to the current working directory (represented by the period, “.”). If you want to store multiple images, you can designate a range (e.g. photopc image 1-5 .), or supply a comma-separated list of images (e.g. photopc image 1,2,5 .). The thumbnail command is extremely useful – instead of downloading the full images, you can create miniatures.
Fully Automatic The options shown so far are available in other programs, of course. What makes command-line tools so special is the fact that you can integrate them neatly into shell scripts. This allows you to combine single steps effectively. The script given in Listing 2 shows an interactive script for photopc. ■
INFO [1] http://photopc.sourceforge.net/ [2] http://www.math.ualberta.ca/imaging/ [3] http://www.lightner.net/lightner/U bruce/ppc_use.html
Charly’s column
SYSADMIN
The Sysadmin’s Daily Grind: OpenVPN
Secure Connections Being able to work anywhere in the world just as if you were attached to the company’s LAN is an appealing prospect, but far too dangerous without taking security measures. BY CHARLY KÜHNAST
Used: OpenSSL and TUN/TAP The prerequisites for running OpenVPN are the OpenSSL library and the TUN/TAP driver. Any of the current distribution should contain both – if not, check out [2] and [3]. Just say the magic words, ./configure; make; make install, to unpack the tarball, which is slightly over 200 KBytes. To demonstrate the principle of OpenVPN let’s open up an unencrypted tunnel between two computers called left and right. The IP addresses of the ethernet interfaces are 1.2.3.4 for left and 4.3.2.1 for right. We can now assign private IP addresses to the endpoints of the tunnel on both computers (/dev/tun0). So let’s assign 10.0.0.1 to the endpoint on left and 10.0.0.2 to the corresponding endpoint on right. Before we get started,
SYSADMIN
we need to load the TUN driver. modprobe tun
should do the trick. We will also need to enable IP forwarding using the following syntax: echo 1 >/proc/sys/net/ipv4/U ip_forward
We can now type command for left
the
following and the following syntax for right:
openvpn --remote right --dev U tun0 --ifconfig 10.0.0.1 U 10.0.0.2
and a similar command for right
MTRG ............................................56 The Multi Router Traffic Grapher’s speciality is monitoring network traffic and displaying the results as graphs.
openvpn --remote left --dev U tun0 --ifconfig 10.0.0.2 U 10.0.0.1 --secret key
The tunnel is up and running.
All done! Of course, a shared secret is not all that secure, if you are in a tight corner. So I would not recommend using this method for official secrets. If you need to keep data secret, you might want to opt for a TLS based approach – OpenVPN offers you that possibility. ■
Encrypting the tunnel
INFO
openvpn --remote left --dev U tun0 --ifconfig 10.0.0.2 U 10.0.0.1
For test purposes we can simply ping the IP addresses of the computers at the opposite ends of the tunnel, i.e. we ping 10.0.0.2 on left and vice-versa. If this simple test does exactly what we expect, we can go on to encrypt the tunnel. The simplest way of doing this is to agree on a shared secret. To do so, we simply type
OpenSSH Part I ......................50 The first in our series on OpenSSH from the Administrator’s perspective. The standard tool for providing encrypted remote access.
openvpn --remote right --dev U tun0 --ifconfig 10.0.0.1 U 10.0.0.2 --secret key
openvpn --genkey --secret key.
on one of the computers. This creates a key file containing random data that still needs to be copied securely to the second computer – we can use scp for this purpose. Let’s start up the tunnel using the following syntax for left:
[1] OpenVPN: http://openvpn.sourceforge.net [2] OpenSSL: http://www.openssl.org [3] TUN/TAP Drivers: http://vtun.sourceforge.net/tun
THE AUTHOR
T
he solution to this issue is well known: Use an encrypted pointto-point connection to build a tunnel through an insecure network. For Linux there are a few popular tools available to help you with this task, such as Cipe and the freeware IPSec implementation, FreeS/WAN, and of course OpenVPN [1] the tool we will be investigating in this issue.
Charly Kühnast is a Unix System Manager at a public datacenter in Moers, near Germany’s famous River Rhine. His tasks include ensuring firewall security and availability and taking care of the DMZ (demilitarized zone). Although Charly started out on IBM mainframes, he has been working predominantly with Linux since 1995.
www.linux-magazine.com
October 2002
49
SYSADMIN
OpenSSH
Y
ou still occasionally come across them – computer networks where protocols such as rsh, rlogin, telnet, FTP and POP3 will transmit passwords in clear text across the LAN and into the Web. It is a well-known fact that any host in the path between the client and the server can view these passwords, and packet sniffers such as Sniffit or Dsniff can make it child’s play to do so. Networks that rely on these traditional services are low-hanging fruit for crackers or script kiddies, and the services we just mentioned are the top targets for exploitation. Having said that, there is no real reason to expose a network to this risk. SSH provides a viable alternative with far more functionality than rlogin, rcp, or telnet could offer. In addition to providing secure authentification for hosts and users, SSH offers encrypted data transfers and recognizes attempts at manipulation. The term SSH refers both to the cryptographic protocol and to its implementations. In contrast to IPSec, where encryption occurs at IP level, SSH provides encryption within the application itself.
OpenSSH from the Administrator’s Perspective
Out of Sight OpenSSH has become the standard tool for providing encrypted remote access. But you will need background knowledge if you intend to implement the security features that OpenSSH offers. BY ANDREW JONES
Installing SSH Most current Linux distributions include OpenSSH packages. If you are a Debian user, you can type “dpkg -L ssh” to discover which SSH files are currently installed. The equivalent command for any RPM based systems would be “rpm -ql openssh”. The post-installation script creates the server keypairs (private and public keys) for both versions of the protocol and stores them in the “ssh_host_key” and “ssh_host_key.pub” files for SSH1. SSH2 can utilize RSA and DSA keys, storing RSA keys in “ssh_host_rsa_key” and “ssh_host_rsa_key.pub” and DSA keys in
THE AUTHOR
Andrew Jones is a contractor to the Linux Information Systems AG http://www.linux-ag.com in Berlin. He has been using Open Source Software for many years. Andrew spends most of his scarce leisure resources looking into Linux and related Open Source projects.
50
October 2002
“ssh_host_dsa_key” and the matching “ssh_host_ dsa_key.pub”. Only the root user should possess read/write access to the server’s private keys (without the “.pub” suffix). The server uses a total of six keyfiles to authenticate against various clients. If a key file is missing, a client that requires a specific key type will not be able to connect. But missing server keys can be created later, if needed, using the “ssh-keygen” command. The configuration file found at “/etc/ssh/ssh_config” contains system defaults for the client program “ssh”; although it tends to be more or less empty under normal circumstances. Users can configure their clients via “~/.ssh/config” (on the command line).
Configuring the Server The SSH server, “sshd”, normally runs as an independent daemon, however, it can be launched via inetd. The daemon
www.linux-magazine.com
method will provide better performance, as “sshd” needs to calculate a server key for SSH1 when launched. However, the inetd variant does provide a practical fallback solution. If the daemon crashes, an admin user will still be able to log on remotely and solve the issue. Of course, the second server will need to listen on a different port (“sshd -i -p port”). You can also use the config file to change the port number (Listing 1, line 8). For multi-homed hosts the admin user can specify the address on which “sshd” will listen. The daemon will bind to any and all available addresses by default. You can change the default behaviour using the “ListenAddress IP” syntax (line 11); multiple occurrences of this option are permissible. “sshd” normally uses syslog to store log output in “/var/log/auth.log” at the “LogLevel INFO” priority level. Additional log output is useful for troubleshooting: Use “VERBOSE” to
OpenSSH
2 Client compares keys
1 Server transmits ssh_known_hosts_key.pub Client
/etc/ssh/ssh_known/hosts ~/.ssh/known_hosts
SYSADMIN
/etc/ssh/ssh_host_rsa_key.pub /etc/ssh/ssh_host_rsa_key
3 Server authenticates
Server uses private 4 key for authentication
5 Encrypted session Server
Figure 1: The server transmits its public key to the client (1), the client compares it to the expected key (2) and authenticates the server (3, 4) using its private key. Any ensuing data will be encrypted before transmission (5); this includes any passwords that may be used
choose the level immediately above “INFO” or failing that “DEBUG” to locate the cause for failed connections server side (lines 31 and 32). When configuring the server you can specify which protocol version is announced during handshake. SSH2 has been default since OpenSSH 2.9, followed by SSH 1: “Protocol 2,1” (line 9). To permit SSH2 only, you will need to change this line to “Protocol 2”.
Authentification Procedure Both protocol versions provide SSH with various methods of authenticating users. The most sophisticated method is the public key approach, which you enable using “PubkeyAuthentication yes” (line 35). In this case users will store their public keys in their home directories and authenticate using the corresponding private key. Server defaults and the OpenSSH packages provided by many
distributions also permit password based authentification schemes, which are enabled using “PasswordAuthentication yes” (line 50). Very few admins change this default which leads to many users working with “ssh” without ever thinking about SSH’s best features. If you require strict logon security, you can specify “PasswordAuthentication no”. SSH encrypts any passwords that cross the wire, but their inherent weaknesses still apply (too short, too easy to guess, rarely changed, and so on). There have been some attempts to guess passwords indirectly by performing timing analysis on the encrypted data, thus removing the need to decrypt the data (SSH Keystroke Timing Attack). Password based login can only be disabled entirely by additionally stipulating “PAMAuthenticationViaKbdInt no” (compare to line 58). Insecure rhost authentication is disabled by default for the server
SSH History The designer of the protocol and the author of the first implementation was Tatu Ylönen, who went on to found SSH Communications Security Ltd. He released SSH 1.0 for Unix in June 1995. Ylönens software was freely available up to version 1.2.12, but licensing became increasingly restrictive.Two variants of version 1 of this protocol were developed and released as versions 1.3 and 1.5. SSH.com stopped maintaining and developing the commercial implementation of the protocol version 1 in May 2001. Offshoots – OpenSSH The OpenSSH SSH implementation was originally based on Ylönens SSH 1.2.12 sources, which were provided without any restrictions on their use, however, it uses the free SSL implementation, OpenSSL, as its cryptobase.The OpenBSD Group is responsible for the OpenSSH project, although a whole group of independent developers are now involved in it.The basic version of OpenSSH (as of this writing currently at 3.1), that runs on OpenBSD systems only, has been ported to a variety of platforms. A modified version, referred to as portable version 3.1p1, was implemented for this purpose. As of version 2.1.0 (dated May 2000) OpenSSH can handle SSH version 2 in addition to version 1. In contrast to commercial SSH both versions are available in a single server binary. If an older client attempts to connect to the server, the server switches to compatability mode as defined in the IETF “SSH Transport Layer Protocol”draft.The server indicates its compatibility mode capability by means of a handshake string,“SSH-1.99”.You can “telnet host 22” to view the string, or simply type “ssh -v host”(“Remote protocol version 1.99”is displayed in this case). Version 2 of the protocol is documented in various IETF drafts; the architecture is described in “draft-ietf-secsh-architecture-09.txt”. All current drafts are available from the IETF website [3].
(lines 38 through 41). The trusted host authentication means that the “/etc/hosts.equiv” and “~/.rhosts” files include those hosts considered trustworthy enough to log on without authenticating – the server trusts the client computer and the authentication process that has occurred on the client. Using rhosts, the server only checks the IP address and port number of the client. If the port number is below 1024, the client process on Unix hosts must be root equivalent. This is intended to prevent simple user programs from spoofing their identity to the server. However, the whole model works on the assumption that the IP address is genuine – an assumption that cannot be safely made in today’s networks.
Reciprocal Trust Relationships between Hosts SSH provides a far superior variant on the host based authentication scheme. The client program is required to authenticate with the host key of the client computer. The client will need root privileges to access this key (the set UID bit needs to be set). In this case SSH also trusts the hosts listed in “~/.shosts” and “/etc/shosts.equiv”; however, this setting is also disabled by default (lines 42 through 45). In most cases you will want to avoid users logging on as root. If multiple users work with the root account it is extremely difficult to determine which administrator has just logged on. The “PermitRootLogin no” (line 31) permits logging on by normal users only. If you happen to be a user with administrative privileges, you can use “su” to step up to this level temporarily. OpenSSH handles not only authentification but also the other steps involved in logging on to the system (launching a
www.linux-magazine.com
October 2002
51
SYSADMIN
OpenSSH
Figure 3: When you log on to an unknown host SSH prompts you to confirm Figure 2: The user “kh” logs on to the host “vaio”, coming from “lux” via
that you want to trust the server key.“kh” fails to confirm which leads to the
“ssh”. The usual system messages are displayed
connection being terminated
session including any required entries in logfiles, launching shells and so on). If you enable the “UseLogin yes” option, OpenSSH will access the system logon program to do so. This makes sense in some environments – where “login” applies restrictions of which “sshd” is not aware. However, this option also has a few security issues. CERT has reported security leakholes on two occasions [6].
Working with Keypairs Public key authentication has several advantages in comparison with a simple password logon scheme – although it does mean some additional setup tasks for the user. Administrators may also need to explain one or two facts to their users, however, you would expect most users to be able to adjust. Three steps are required before logging on remotely with an RSA or DSA key: • The user must generate a keypair (public and private keys). • The public key must be copied to the “~/.ssh/authorized_keys” file on the remote host. • The private key must be available on the local host. The “ssh-keygen” is used to create
keypairs. If you launch this tool without any arguments, it will create an SSH1 compatible RSA keypair and store it in “~/.ssh/”. The files created by this syntax are called “identity” (private key) and “identity.pub” (public key). As already mentioned, these files apply to SSH 1 only, but OpenSSH also uses the version 2 protocol, which is more secure. DSA keys were introduced into SSH 2. To create a DSA keypair use the “ssh-keygen -t dsa” command to create the “~/.ssh/id_dsa” (private key, version 2) and “~/.ssh/id_dsa.pub” (public key, version 2) files. For RSA keys for SSH 2, type “ssh-keygen -t rsa”. In this case the file names will include the “rsa” string. Any files containing private keys should only be readable/writeable to their users. OpenSSH checks these privileges during key evaluation and refuses the connection if the privileges are too loose.
Encrypting Keys Access privileges for the key files provide effective protection against inquisitive users on the system, however, this is no defence against a user with root access. If the home directory is on an NFS
SSH and Security SSH has also (and unfortunately) had its share of (in)security incidents.Version 1 of the protocol is susceptible to man in the middle attacks, as it relies on cryptographically weak CRC 32 encoding to ensure packet integrity. Specially crafted data packets that include the correct CRC enable an attacker to inject data into an encrypted session without SSH noticing it. The “SSH CRC32 attack detection”facility was designed to detect injection attacks. However, this code was found to contain a buffer overflow that allowed remote attackers to gain root on several older versions. CERT reports that these older versions are still being systematically sought out and exploited on a large scale [4]. Version 2 completely removed the CRC vulnerability.The new protocol relies on a cryptographically robust MAC algorithm (Message Authentication Code) – to be more precise it uses an RFC 2104 HMAC (Keyed Hashing for Message Authentication). OpenSSH documents security and bugfixes on the project website [5]. A recent incident showed how important it is to install the latest stable releases of programs relevant to system security: OpenSSH 3.0.2 is susceptible to an off by one error, which can allow an authenticated user to achieve root privileges (refer to “InSecurity News”in this issue).
52
October 2002
www.linux-magazine.com
server, the key will even be transferred across the wire in plain text from the NFS server to the local workstation. And this would defeat any advantages gained from key based authentication – but there is a solution to this issue. Keys can be passphrase protected on creation. “ssh-keygen” uses the passphrase to encrypt the private key assuring that it is protected from any snoopers – not even root can decrypt this data without the passphrase. The passphrase can be modified later using “ssh-keygen -p -f ~/.ssh/id_dsa”. The public key must reside in “~/.ssh/authorized_keys” on the target host. This allows the server to verify that the public key really does belong to the user attempting to log on. In the case of OpenSSH servers prior to version 3.0 a file called “~/.ssh/authorized_keys2” with the same content is also required for SSH-2 keys. As the name implies the public does not need to be kept a secret. So the admin of the target system can safely store the key in the home directory, after ascertaining that the key really is the genuine article. Just sending an email with the key as an attachment is fairly risky, however, you could use GPG to sign the mail. Alternatively the admin user could compare the fingerprint “ssh-keygen -l -f keyfile” of the received key with the original fingerprint. A short phonecall would be sufficient to deal with this issue.
Behind the Barricades It should now be possible to log on to the target system: ssh -v user@hostcomputer
The “-v” option makes OpenSSH output debugging information, which can be
OpenSSH
extremely useful if any issues occur. The SSH program running locally then prompts the user for the passphrase for the private key belonging to the target account. Assuming that the correct passphrase is entered, SSH will then authenticate the user on the target host, and the user will be placed in the Shell environment he expects (see Figure 2). The client can also specify the protocol type (SSH 2 or SSH 1) using the “-2” and “-1” flags: kh@lux:~$ ssh -2 kh@vaio
The login does not need to be the same locally as on the remote host. Admins
will generally prefer to work as a normal user locally, but need to be root on a remote host. No problem for SSH:
SYSADMIN
remote SSH daemon transmits its public key to the client and authenticates using its own private key.
The “known_hosts” Security Database
kh@lux:~$ ssh root@vaio
You can also store your own public key in “~/.ssh/authorized_keys”, in the home directory for root on the target system. In this case you will need to set “PermitRootLogin yes” in “sshd_config” (line 31 in Listing 1). Keypairs are not only available for users, but also for hosts (see Figure 1). This allows the client to verify that it is really connected to the required server. To do so, during connection setup the
The client stores the host key in the text file “~/.ssh/known_hosts”. The SSH 2 drafts specify that SSH clients must request confirmation by the user, in case of unknown servers, that the user really does want to connect to the target. If the user cancels, the connection is terminated (see Figure 3). Users should avoid typing “yes” without considering their options at this point – after all this prompt is one of
Listing 1: Server Configuration “sshd_config” 01 # $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ 02 03 # This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin: /sbin 04 05 # This is the sshd server system-wide configuration file. See sshd(8) 06 # for more information. 07 08 Port 22 09 #Protocol 2,1 10 #ListenAddress 0.0.0.0 11 #ListenAddress :: 12 13 # HostKey for protocol version 1 14 HostKey /etc/ssh_host_key 15 # HostKeys for protocol version 2 16 HostKey /etc/ssh_host_rsa_key 17 HostKey /etc/ssh_host_dsa_key 18 19 # Lifetime and size of ephemeral version 1 server key 20 KeyRegenerationInterval 3600 21 ServerKeyBits 768 22 23 # Logging 24 SyslogFacility AUTH 25 LogLevel INFO 26 #obsoletes QuietMode and FascistLogging
27 28 29 30 31 32 33 34 35 36
54 # Authentication:
#ChallengeResponseAuthenticat ion no
LoginGraceTime 600 PermitRootLogin yes StrictModes yes RSAAuthentication yes PubkeyAuthentication yes #AuthorizedKeysFile %h/.ssh/authorized_keys
37 38 # rhosts authentication should not be used 39 RhostsAuthentication no 40 # Don't read the user's ~/.rhosts and ~/.shosts files 41 IgnoreRhosts yes 42 # For this to work you will also need host keys in /etc/ssh_known_hosts 43 RhostsRSAAuthentication no 44 # similar for prot. version 2 45 HostbasedAuthentication no 46 # Uncomment if you don't trust ~/.ssh/known_hosts for RhostsRSAAuthentication 47 #IgnoreUserKnownHosts yes 48 49 # To disable tunneled clear text passwords, change to no here! 50 PasswordAuthentication yes 51 PermitEmptyPasswords no 52 53 # Uncomment to disable s/key passwords
55 56 # Uncomment to enable PAM keyboard-interactive authentication 57 # Warning: enabling this may bypass the setting of 'PasswordAuthentication' 58 #PAMAuthenticationViaKbdInt yes 59 60 # To change Kerberos options 61 #KerberosAuthentication no 62 #KerberosOrLocalPasswd yes 63 #AFSTokenPassing no 64 #KerberosTicketCleanup no 65 66 # Kerberos TGT Passing does only work with the AFS kaserver 67 #KerberosTgtPassing yes 68 69 X11Forwarding no 70 X11DisplayOffset 10 71 PrintMotd yes 72 #PrintLastLog no 73 KeepAlive yes 74 #UseLogin no 75 76 #MaxStartups 10:30:60 77 #Banner /etc/issue.net 78 #ReverseMappingCheck yes 79 80 Subsystem sftp /usr/libexec/sftp-server
www.linux-magazine.com
October 2002
53
SYSADMIN
OpenSSH
major security features of SSH. The fingerprint can easily be used to verify the key by calling the admin on the target host. The admin can display the fingerprint of the original host key by typing “ssh-keygen -l -f keyfile”. Users should only place keys in their “known_hosts” files, if these fingerprints do match (Figure 4). Verifying the server host key provides protection against “man in the middle” attacks, where the attacker will manipulate DNS or ARP, or spoof the IP address of the genuine server so to impersonate that server. At the same time, the attacker connects to the real server and relays the data without the user noticing any difference. Cryptography can protect your users against attacks of this kind, but only if they play the game. If the client has no data on the server, it cannot authenticate the server. If the SSH client already knows the genuine server’s public key – that is, the key is stored in “known_hosts”, the client can automatically detect an attack. The attacker will not know the original secret key and thus be unable to successfully use the public key of the required target, instead attackers would be forced to transmit their own keys. On comparing the key with the entry in its key ring the client would notice a discrepancy, warn the user (Figure 5) and cancel the connection. However, the warning can be harmless – if the server key has really changed. This occurs when an admin user generates a new key after a disk crash where a backup is not available, after a hardware replacement or simply reinstalling SSH without saving the old key. The warning will be issued to the client until the user removes the old
server key from “known_hosts”. The next time the user connects she will again be prompted to confirm the identity of the server (Figure 4).
Key Management An admin may prefer not to bother her users with this procedure and maintain a global “/etc/ssh/ssh_known_hosts” file. If a host key changes, the admin user can modify the entry in the list of public keys. Key authentication schemes like the one described may make connections safer, but you still need to input a passphrase instead of a password. The “ssh-agent” and “ssh-add” programs make short work of this onerous task. The SSH agent is a kind of cache agent that provides access to decrypted private keys. It runs as a daemon and is available only to the user that launched it. The daemon communicates over Unix Domain Sockets; using the environment variables “SSH_AUTH_SOCK” and “SSH_AGENT_PID” to let its child processes know which socket it will use. The X11 init scripts in many distributions launch the SSH agent as the parent process of the X11 session making it available to every X11 terminal. You can verify this by typing the following command: kh@lux:~$ set | grep SSH SSH_AGENT_PID=2097 SSH_AUTH_SOCK=/tmp/U ssh-XX70h6xH/agent.2062
If the SSH agent is not running, and is only required in a single Shell, you can use the following syntax: kh@lux:~$ ssh-agent $SHELL
The SSH agent launches the “$SHELL” subshell with the required environment variables and then retires to the background. Now you can use “ssh-add” to save any number of decrypted private keys in the cache: kh@lux:~$ ssh-add ~/.ssh/id_dsa Enter passphrase for U /home/kh/.ssh/id_dsa: Identity added: U /home/kh/.ssh/id_dsa U (/home/kh/.ssh/id_dsa) kh@lux:~$
The keys are now available SSH client without having to the passphrase. The following displays the keys currently being by the SSH agent:
to the repeat syntax cached
kh@lux:~$ ssh-add -l 1024 87:db:4c:0a:6a:c5:56:6b:U 74:6f:1c:8e:65:0a:ce:b2 U /home/kh/.ssh/id_dsa (DSA) kh@lux:~$
Typing “ssh-add -D” deletes the whole key cache. To remove individual keys, you can simply type the command “sh-add -d ~/.ssh/id_dsa”.
Agent Forwarding If you log on to various hosts in succession, you might like to look into the “ForwardAgent” option. This means you can avoid storing a public/private key combination at each step of the way and launching additional agent processes. A single SSH agent on a trusted host suffices; any hosts you connect to via this host refer back to the “ssh-agent” at the top of the tree. There are three ways of enabling agent forwarding: globally using the
Figure 4: If the user is sure that the host key really does belong to the desired
Figure 5: If the SSH client determines that the public key of a server has
target host, she can place the key in the list of trusted hosts
changed, it assumes a man in the middle attack and issues a warning
54
October 2002
www.linux-magazine.com
OpenSSH
SYSADMIN
tar cvf - /the/directory | U ssh user@host dd of=/dev/tape
Checking Your Configuration A thorough review of the default configuration provided by your distribution is particularly relevant for security programs. If you determine any discrepancies between your configuration and Listing 1, you should ensure that they were made deliberately. In case of doubt, check “man sshd”.You need to restart the daemon to activate any modifications made to “sshd_conf”.You can use standard tools to check whether the daemon is running: ps ax | grep sshd Or alternatively:
You should be aware that your tape drive’s performance may be seriously affected. This is caused by “dd” and the tape drive both having to wait for data. If the datastream is interrupted, the tape drive has to stop and backtrack before it can carry on writing. A small tool can help solve this issue:
netstat -tpan By the way, when you restart the master “sshd” the forked SSH connections are maintained – no need to worry about users being logged out.To find out which process is the master process, simply view “/var/run/sshd.pid”. Of course, if the daemon fails to restart, you will no longer be able to log on remotely via SSH. There is an unexpected trap for computers secured independently of SSH using TCP-Wrappers (“/etc/hosts.allow”and “/etc/hosts.deny”).“sshd”evaluates your TCP-Wrappers settings, even if the daemon was launched independently of inetd.“sshd”makes direct use of “libwrap.a”. On the practical side, you can supply most of the options set in “sshd.conf”as arguments on launching the program.This allows you to test the effect of various options. Provided you launch the test daemon on a different port than usual, you can even perform tests without interfering with your production server.You simply set your client to access the new port:“ssh -p port”.
“ForwardAgent yes” entry in the “/etc/ssh/ssh_config”, for individual users in “~/.ssh/config” or via the “-A” option of the “ssh” command. However, this approach does have some negative implications. Root may be able to perform a core dump to view the decrypted keys – however, if you do not trust the root account you may prefer not to save any secrets on that machine. Even if you decide against using the SSH agent, root could use a trojanized SSH client or a TTY sniffer to access the secret data when a user is entering her secret passphrase.
Secure File Transfer with “scp” and “sftp” SSH can be used for more than just remote logins. One example of SSH’s flexibility is its ability to copy files across an encrypted connection. The “scp” and “sftp” programs, which are part of the OpenSSH suite, are provided for this purpose. “scp” uses the same syntax as the less secure “rcp”. Copying a local file to a target host, type “scp localfile user@host.remote:targetfile”. To copy in the other direction – that is, to copy a remote file to a local host, simply type “scp user@host.remote:remotefile localU target”. Just like the interactive “ssh” tool, the command allows you to specify
a variety of options, such as the protocol versions, the verbosity level, user names and levels of compression. The “sftp” program fulfills the same task as “scp”, although its usage is similar to a command-line “ftp” client. You will need to enable the server subsystem “sftp-server” in “sshd_config” (last line in Listing 1). Most users should feel at home using “sftp” interactively:
tar cvf - /the/directory | U buffer | ssh -c blowfish U root@vaio buffer -o /dev/tape
Buffer[9] spawns two separate processes that independently read data from the network and write to the tape drive, providing caching for enhanced performance. In our example we also set the OpenSSH option “-c blowfish” to enable the extremely quick but secure Blowfish encryption algorithm. Thus, OpenSSH can deal with requirements for security and speed, which are often viewed as contradictory. ■
INFO [1] OpenSSH project website: http://www.openssh.com [2] SSH newsgroup: news:comp.security.ssh [3] Current drafts on SecSH: http://www.ietf.org/ids.by.wg/secsh.html
lux:/tmp$ sftp root@vaio Connecting to vaio... sftp> pwd Remote working directory: /root sftp>
[4] CERT Incident Note on SSH exploits: http://www.cert.org/incident_notes/U IN-2001-12.html [5] Security history of OpenSSH: http://www.openssh.com/security.html
Gftp[8] even provides a friendly GUI for FTP and SFTP.
[6] “UseLogin”vulnerabilities: http://www.kb.cert.org/vuls/id/157447, http://www.kb.cert.org/vuls/id/40327
Backup via SSH One feature that will appeal to administrators is the ability to perform backups across the wire via SSH. “scp” or “sftp” are not required for this task as you can pipe SSH using the shell: tar czvf - /the/directory | U ssh user@host "cat U >/tmp/foo.tar.gz"
The receiving end can also write the data directly to a tape drive:
[7] Daniel J. Barrett and Richard E. Silverman, “SSH:The Secure Shell”, O’Reilly 2001, http://www.snailbook.com/ [8] Gftp, GUI for SFTP: http://gftp.seul.org/ [9] Buffer: http://packages.debian.org/U testing/utils/buffer.html [10]SSH FAQ: http://www.employees.org/U ~satch/ssh/faq/ssh-faq.html [11] Beginner-friendly series on OpenSSH: http://www.mandrakeuser.org/docs/U secure/sssh.html
www.linux-magazine.com
October 2002
55
SYSADMIN
MRTG
Network Management with MRTG
All Dressed Up MRTG’s speciality is monitoring network traffic and displaying the results graphically. The Multi Router Traffic Grapher also retrieves miscellaneous SNMP variables and can be customized to fulfill even the more exotic desires of the administrator. BY WILHELM BOEDDINGHAUS
E
ven the dry kind of statistics that SNMP agents continually create can be appreciated at a single glance when visualized. The Multi Router Traffic Grapher (MRTG)[1] by Tobias Oetiker is the classic tool in this area. Released under GPL, MRTG monitors network traffic, queries routers and switches, and creates concise graphs of the collated data embedding them in a website. But MRTG has a another couple of tricks up its sleeve, such as querying network interfaces for error messages, or monitoring hard disk loads. The following examples are based on a Linux router running on SuSE Linux 7.3, fitted with two network interface cards and a single hard disk. It is MRTG’s job to monitor the network traffic, while at the same time watching out for NIC errors and hard disk capacity issues on a single partition. Various Linux distributions include both MRTG and the additional programs and libraries it requires. If this happens not to be true of your distribution, you should not find it too difficult to perform
a manual installation. To do so you will need Perl, a C compiler and the GD library by Thomas Boutell. This library in turn requires the Libpng and Zlib libraries. All of these components will run on Linux, Unix and even Windows.
Auto-Configuration MRTG needs a configuration file for each device it is to monitor, however the file can be generated automatically. The “cfgmaker” program from the MRTG package writes a configuration file that allows MRTG to monitor network traffic. “indexmaker” creates an HTML index page containing an overview of the devices being monitored. Using the standard configuration file created by “cfgmaker” MRTG will use SNMP to monitor the network interface of the devices in question. It does not matter whether you are dealing with a server equipped with a single network interface or a router, or switch, with multiple interfaces.
The prerequisites are that the device you are monitoring can produce SNMP data and that MRTG has read privileges for SNMP on that device. You will need to supply all of the following information in the “cfgmaker” command line: • IP address or DNS name of the device to be monitored, this is “192.168.33.1” in our example, • Community String: “secret”, • Name of the config file you want to create: “/usr/local/mrtg/linux.cfg”, • Path for storing the HTML pages: “/usr/local/mrtg/html” • and the additional option: “growright”. MRTG stores the graphs and the collected data in the same directory as the HTML pages. If you intend to access the pages via a Web server, then the
Figure 1: The MRTG graph shows the network has enough capacity. The maxi-
Figure 2: The partition originally contains 2829 Mbytes of data, but this
mum load is a mere 1.7 percent (lower left value,“Maximum”)
value drops to 908 Mbytes after the user tidies up her hard disk
56
October 2002
www.linux-magazine.com
MRTG
server will also need read access to the HTML directory. The default setting displays newer values on the left of the graph, but you can place them on the right by setting the “growright” flag. We want to display the background in light gray and use the interface name, “eth0” for example, instead of a serial number (default) to describe the interface: cfgmaker U --output=U /usr/local/mrtg/linux.cfg U --global U "workdir:/usr/local/mrtg/html" U --global "Language:english" U --global U "options[_]: growright" U --global U "Background[_]: #eeeeee" U --ifdesc=descr U geheim@192.168.33.1
The results are stored in “/usr/local/ mrtg/linux.cfg”. The name of this file is passed to MRTG in the first argument: “mrtg /usr/local/mrtg/linux.cfg”. This syntax creates an HTML page for each interface, logfiles containing the acquired data, and the graphs in the “/usr/local/mrtg/html/” directory. You will need to create this directory before you enter the command. MRTG automatically deletes older data – this results in an error message when you first launch the program, but you can safely ignore the message. Check out Figure 1 for the results. The MRTG command needs to be launched every five minutes as a cron job. This interval is important to allow
MRTG to calculate mean values correctly. You can set the interval in the configuration file.
Manual Configuration
SYSADMIN
The script uses the Perl module “NET::SNMP” to retrieve these two values and then calculates the disk space used in Mbytes. The script expects both the host name and the community string as arguments. To call the Perl script, the “Target” entry in the MRTG configuration file (see Listing 3) must include the name of the script and the arguments in backticks “`”. The first parameter is the host address “192.168.33.1” followed by the community string “public”.
If you require other SNMP variables than the transferred volume of data, then you must provide a configuration file by hand. The example in listing 1 queries the errors, which arise on a network interface. The global options correspond to those for the data transfer, with other fields explained in table 1. The call, again Cron steered, should take place every five minutes.
Differences
External Data Sources
The “gauge” option tells MRTG to store the value exactly as it was read, instead
Instead of using SNMP directly, you can also integrate your own scripts and programs that acquire data externally and pass it on to MRTG. This allows you to visualize metrics without using an SNMP agent. It is often easier to use an existing SSH interface to a machine than to set up an SNMP agent on the machine. The article on OpenSSH in the “Sysadmin” column of this issue shows you how to use cron to call SSH without cron needing to store the password or passphrase. A script can also collate statistics from multiple sources or process the statistics in some other way. The following example uses a Perl script (see Listing 2) to ascertain how much space has been used up on the first hard disk partition. To do so, the script needs to work with two SNMP variables: • The block size is stored in the “hrStorageAllocationUnits” variable, • the number of blocks used is stored in “hrStorageUsed”.
Table 1: Configuration Options Option
Meaning
Target
With the Target keyword you tell mrtg what SNMP variables it should monitor. The Target keyword takes arguments in a wide range of formats
Options
The Options Keyword allows you to set some boolean switches:nopercent is used when you do not want to print usage percentages
Title
Title of the produced HTML site
MaxBytes
The upper limit necessary in order to scale the axes and to compute percentages.
YLegend
The Y-axis label of the graph.
ShortLegend
The units string
LegendI
The strings for the colour legend for Incoming
LegendO
The strings for the colour legend for Outgoing
Legend1, Legend2
The strings for the colour legend
PageTop
Things to add to the top of the generated HTML page. Note that you can have several lines of text as long as the first column is empty.
Listing 1: MRTG showing network errors WorkDir: /usr/local/mrtg/html Language: english Background[_]: #eeeeee Target[interfaceerrors_2]:U 1.3.6.1.2.1.2.2.1.14.2&1.3.6.1.U 2.1.2.2.1.20.2:secret@192.168.U 33.1 Options[interfaceerrors_2]:U growright,nopercent Title[interfaceerrors_2]:U Error Interface eth0 MaxBytes[interfaceerrors_2]:U 10000 Ylegend[interfaceerrors_2]:U Error ShortLegend[interfaceerrors_2]:U Legend1[interfaceerrors_2]:U Input Error Legend2[interfaceerrors_2]:U Output Error LegendI[interfaceerrors_2]:U INPUT LegendO[interfaceerrors_2]:U OUTPUT PageTop[interfaceerrors_2]:U <H1>Input / Output Errors</H1> Error on Interface eth0 Target[interfaceerrors_3]:U 1.3.6.1.2.1.2.2.1.14.3&1.3.6.1.U 2.1.2.2.1.20.3:geheim@192.168.U 33.1 Options[interfaceerrors_3]:U growright,nopercent #... additional entries for eth1
www.linux-magazine.com
October 2002
57
SYSADMIN
MRTG
Listing 2: External Script #!/usr/bin/perl -w # MRTG Script calculates U amount of disk space used use Net::SNMP; # Object IDs of SNMP Variables my $uptimeOID = U '.1.3.6.1.2.1.1.3.0'; my $nameOID = U '.1.3.6.1.2.1.1.5.0'; # hrStorageAllocationUnits my $unitsOID = U '.1.3.6.1.2.1.25.2.3.1.4.1'; # hrStorageUsed my $usedOID = U '.1.3.6.1.2.1.25.2.3.1.6.1'; # Retrieve values ($session, $error) = U Net::SNMP->session(
of storing the difference to the last value. If you are attempting to obtain a data transfer statistic, you would need the difference between the two values as SNMP agents accumulate the total number of packets transferred, i.e. the value increases continually. An absolute value is appropriate for a hard disk statistic. The “noo” (no output) entry stops MRTG from outputing the second value; “noi” would prevent the first value from being displayed. However, the script will need to pass both values. MRTG expects four lines of output: first value, second value, system uptime, and name of system.
Listing 3: Configuration for an external script WorkDir: /usr/local/mrtg/html Language: english
Hostname =>$ARGV[0],
Background[_]: #eeeeee
Community => $ARGV[1]); the "Session-Error: $error" U unless ($session);
Target[harddisk]: U
# Uptime and name
mrtg-get-linux.pl 192.168.33.1 U
$result = $session->
public`
get_request($uptimeOID);
Options[harddisk]: U
$uptime = $result->{$uptimeOID};
growright,noo,gauge
$result = U $session->get_request($nameOID);
Title[harddisk]: U
$name
= U $result->{$nameOID};
MaxBytes[harddisk]: 3138
# Block size and number of U blocks in use
ShortLegend[harddisk]:
$result = U $session->get_request($unitsOID);
hard disk useage in MB
$units
`/usr/local/mrtg/perl/ U
hard disk useage Ylegend[harddisk]: MB
= $result->{$unitsOID};
Legend1[harddisk]: U Legend2[harddisk]: not used LegendI[harddisk]:
MB
$result = U $session->get_request($usedOID);
LegendO[harddisk]:
not used
$used
<H1>Hard disk usage</H1> U
= $result->{$usedOID};
# Convert space used to Mbytes U $usedMB = int (($units * U
PageTop[harddisk]: U harddisk "/"
$used) / (1024 * 1024));
INFO # Pass values to MRTG print "$usedMB\n";
[1] Tobias Oetiker, MRTG Homepage: http://www.mrtg.org
print "0\n"; U # second value is not displayed
[2] RRD Tool: http://people.ee.ethz.ch/U ~oetiker/webtools/rrdtool/
print "$uptime\n";
[3] MIB Central: http://www.mibcentral.com/
print "$name\n";
58
October 2002
[4] SNMP for the Public: http://www.wtcs.org/snmp4tpc/
www.linux-magazine.com
MRTG writes the contents of the third and fourth lines at the top of the HTML pages, prepending the statistics. You can see the results in Figure 2. Those who want to provide many statistics with MRTG, will be pleased with the “more indexmaker” feature. This program from the MRTG package produces a summary page of the MRTG graphs. This usually requires filters of several MRTG configuration files.
Keeping track of things with Indexmaker The following example defines the output file and a single column overview. The page title is additionally supplied. The title stored in the “Title” variable in the configuration file will be used for each graph; this shows up as “hard disk useage” in our example. /usr/bin/indexmaker U --output=/usr/local/mrtg/html/U index.html U --columns=1 U --title "Status Linux Router" U --section title U /usr/local/mrtg/linux.cfg U /usr/local/mrtg/U linux-errors.cfg U /usr/local/mrtg/U linux-harddisk.cfg
The result is a HTML page with five daily graphs (Data Transfer IN, Data Transfer OUT, Errors IN, Errors OUT, harddisk). The sixth value has been masked and will be ignored by the Indexmaker. Each graph is a link: Just click on it to display the page with detailed graphs and statistics in your browser.
Prospects MRTG is a flexible and powerful network and host monitoring system. However, the format used to store the statistical data prevents any further processing, as MRTG only logs mean values required for producing weekly, monthly and annual statistics. If you look close enough, you might be able to estimate the future load on your components, but you will be unable to make an entirely accurate prediction. RRD[2] may prove a better tool for this task: The Round Robin Database was also written by MRTG’s author. ■
PROGRAMMING
Tcl/Tk: BWidgets
BWidgets: New GUI Elements for Tcl/Tk
Sparkling Desktop T
he Tcl GUI toolkit Tk comprises a complete selection of standard widgets, such as frames, buttons and listboxes. But some applications still need more. Additional widgets, such as tree or combobox are available in various add-ons, such as BLT[1], however, they tend to be precompiled, and thus platform specific. BWidgets do not have this disadvantage as they were written in Tcl/Tk and can thus be distributed with the application across platform boundaries. BWidgets are based on a development by Unifix and are now available from Sourceforge [2] under the BSD license. After extracting the archive you need to assign the add-on to the interpreter. To do so, you use the library path variable, “auto_path”:
BWidgets add common GUI elements such as tree and notebook to the standard Tk widgets. In contrast to most alternatives they were programmed entirely in Tcl/Tk and are thus well suited to platform independent applications. BY CARSTEN ZERBST
lappend auto_path [file join U //usr local lib BWidget-1.4.1]
The archive not only contains the add-on itself but also the comprehensive documentation, which you will find in the “BWMan” subdirectory. The interfaces are similar to those of the normal Tk widgets, however, the widget commands are capitalized.
Grouping and Organizing Widgets Some BWidgets are used for organizing graphical desktops: • Notebook (also referred to as a tab) • Frame with frametext • Vertically or horizontally tiled panes with modifiable partitioning • Scroll areas The Notebook widget (see Figures 1a and 1b, commonly referred to as a tabbed widget) allows the user to toggle between tabs. This in turn allows for more intuitive use. You can use the following syntax to add a tab: set frame [$notebook insert U end name -text "Example"]
60
October 2002
The new frame can accomodate additional widgets. Tabs can of course be accessed via the mouse. The program can also specify the foreground tab: $notebook raise name
Tabs can also be queried, sorted or deleted, but notebooks have limitations if you need to add a larger number of entries. Depending on the tab titles, there may not be enough room to display even four or five tabs simultaneously. The widget will then display buttons that allow you to click through the tabs (Figure 2), although this will affect the application’s handling. You can solve this issue by using a page manager with a menu or a toolbar
www.linux-magazine.com
that allows the user to toggle the available tabs. Strict division of desktop areas, as provided by notebooks, is not always necessary. A title frame, a frame with a title bar, as shown in Figure 1a is slightly less obtrusive. However, Tk 8.4 supports the more powerful label frame, which is easier to use and allows you to assert more influence on its appearance. You can even use widgets in the title.
Divide and Conquer Horizontally or vertically tiled areas are common GUI elements. The user can then use a button (sash) to determine the space assigned in each desktop area. The widget to use in this case is the
Tcl/Tk: BWidgets
PROGRAMMING
paned window. The “-side” option allows you to specify the tiling direction and the position of the sash. Just like the “grid” command you use the “-weight” option to specify the tiling proportions. You can add an area to a paned window by typing “set frame [$panedWindow add]”, and then place your own widgets in the new frame.
Automatic Scrollbars When you move a paned window, at least one area is normally too small to accomodate all the widgets the window contains. But there is an elegant solution to this issue – that is, to use a scrolled window (“$scrolledWindow setwidget $child”) that automatically displays scrollbars when required. However, only a few widgets, such as “text” or “canvas” will work with scrollbars. The scrollable frame defines a scrolling BWidget container, capable of accomodating any widget. The BWidgets package traditionally duplicates some normal Tk widgets. We will not be looking into these widgets, as there is virtually no difference between their functionality and that provided by the native Tk widgets.
Needful Things The additional input features not available in normal Tk, Combobox and Spinbox, are particularly interesting. A spinbox will become available in Tk 8.4, but this still leaves the combobox as an important input widget (see Listing 2). The widget allows the user to choose from a list containing presets defined by the “-values” option. The selection list can be uniquely predefined or – as shown in line 8 of our example – you can use a callback to update the list before displaying it. If the user selects a new value, the widget calls a further callback function. The “.combo getvalue” function (line 12) will return the selected item, allowing you to access the element via “lindex”. As line 25 of the sample program shows, the selected element can also be controlled via Tcl. In addition to the numeric position “.combo setvalue @Position”, “last”, “first”, “next” or “previous” are also available. If the “-editable true” flag has been set, the user can also use the input field to type a
Figure 1a: Users can toggle
Figure 1b: The individual tabs
Figure 2: If there are too many
between tabs in the notebook
can accomodate widgets, even
tabs, two arrow buttons allow
widget
paned windows
for scrolling
Listing 1: Base Widgets #!/usr/local/bin/wish8.3 # Base Widgets from BWidgets lappend auto_path [file join [pwd] BWidget-1.4.1] package require BWidget 1.4.1 set notebook [NoteBook .nb] pack $notebook -expand true -fill both # First notebook with TitleFrame set frame [$notebook insert end tf -text "TitleFrame"] TitleFrame $frame.title -text "Frame with title" pack $frame.title -expand true -fill both set f [$frame.title getframe] label $f.label -text "an entry" pack $f.label # Second notebook with PanedWindow, # ScrolledWindow and ScrolledFrame set frame [$notebook insert end sp -text "PanedWindow" ] set panedWindow [PanedWindow $frame.pw -side top ] pack $frame.pw -expand true -fill both set pane [$panedWindow add -weight 1] set sw [ScrolledWindow $pane.sw] set text [text $sw.text -wrap none -width 50 -heigh 50 $text insert 0.0 "right text in ScrolledWindow" $sw setwidget $text pack $sw -fill both -expand yes set pane set sw
-bg white]
[$panedWindow add -weight 9] [ScrolledWindow $pane.sw]
pack $sw -fill both -expand yes set sf [ScrollableFrame $sw.sf] $sw setwidget $sf set f [$sf getframe] label $f.label -text "A label in a ScrollableFrame" pack $f.label # foreach t {a b c d e f g} { # $notebook insert end $t -text "Tab $t" # } $notebook raise [$notebook page 0] wm title . "Order" wm geometry . 200x200
www.linux-magazine.com
October 2002
61
PROGRAMMING
Tcl/Tk: BWidgets
value directly. However, the callback function is not used in this case. The drag & drop mechanism is a special case. The current Tk version does not contain a native implementation,
Listing 2: Combobox #!exec /usr/local/bin/wish8.3 # Example of BWidget Combobox lappend auto_path [file joinU [pwd] BWidget-1.4.1] package require BWidget 1.4.1 proc selectionUpdate {} { .combo configure -valuesU [glob -nocomplain *] }
however, you might prefer to use George Petasis’ “tkdnd” [3] add-on, which supports native X11 and Windows drag & drop and is capable of communicating with Gnome or KDE applications.
insert a node into the tree; this step is performed using the command:
Top of the Tree
The index accepts the same values as the “lindex” command, mainly “end”. The parent field designates the node under which the new node will be inserted – this will be “root” for the first node. The node name can be any string, however, the name must be unique amongst the tree widgets. The node name is usually derived from the data to be displayed, although a serial number can be used. There are a few options available for nodes in addition to the text and image to be displayed, particularly “-data value”, which can store any additional data in the node.
Besides the structural elements, the tree display is one the most important features in BWidgets. Let’s look at a simple file browser in Listing 3 as an example (see also Figure 3). Line 10 creates the widget and specifies both the colors and a callback function that is called on opening and closing a node. Trees can become extremely large, and this is why line 7 again reverts to the scrolled window. The first step is to
proc valueChange {} { set index [.combo getvalue] set selection [.combo cgetU -values] .value configure -textU [lindex $selection $index ] }
Current Content Only
ComboBox .combo -postcommandU selectionUpdate \ -modifycmd valueChangeU -editable false \ -entrybg white label .value grid .value .combo -sticky ewU -padx 10 selectionUpdate .combo setvalue @0 valueChange wm title .
"ComboBox"
$tree insert index parent name
Figure 3: This file browser was implemented using the BWidgets tree and shows the contents of a directory when you open the directory
The callback function “nodeOpen” (line 27) first deletes all the child nodes and then recreates them when a node is opened – thus ensuring that the file browser will always display the current content of the directory. If a user wants to use the mouse to open a subtree, she will need to click the small box next to the node. The box is either drawn automatically (as in the case of the first node) or explicitly using the “-drawcross always|never|auto” option. Most applications will require the user to select a node. You can use “$tree bindText Event Callback” or “bindImage” to bind a callback function to an event
Breaking News On September 5 Tcl/Tk finally went to version 8.4.The sources for the beta version are available from [4], and pre-compiled packages for Linux from Activestate [5].We will be looking at the new version’s features in our next issue. You can download the presentations from this year’s European Tcl/Tk meeting, which was held in Munich, from Michael Haschek’s web site. A variety of topics were discussed, ranging from e-learning to 3D graphics.The latter also looked into a new Tcl add-on by General Motors [7] that supports both tensors and the production of 3D graphics. By the way,Tcl/Tk users in Munich are currently founding their own user group.
Figure 4: The Icon package by Adrian Davis not only contains a large number of icons, but also a browser that you can use to view the icon pool
62
October 2002
www.linux-magazine.com
The Bwdigets comprise a number of icons that can be queried using the (undocumented) bitmap command. For improved ease of use Adrian Davis has compiled a number of freeware icons – from KDE for example – in his “icon”package [8]. He also provides a browser for this purpose (see Figure 4).”
Tcl/Tk: BWidgets
PROGRAMMING
Listing 3: Tree Widget #!/usr/local/bin/wish8.3 # The BWidgets tree widget lappend auto_path [file join [pwd] BWidget-1.4.1] package require BWidget 1.4.1 set sw [ScrolledWindow .sw -relief sunkenU -borderwidth 2] grid $sw -sticky nesw set tree [Tree $sw.tree -background white \ -selectbackground LightSkyBlue \ -opencmd nodeOpen \ -closecmd nodeClose ] $sw setwidget $tree $tree bindText <Button-1> selected $tree bindImage <Button-1> selected label .label -textvariable fileInfo -anchor w grid .label -sticky ew grid columnconfigure . 0 -weight 10 grid rowconfigure . 0 -weight 10 grid rowconfigure . 1 -weight 1 # Callbacks proc nodeOpen {node} { # Swap icon $::tree itemconfigure $node -image [Bitmap::U get openfold] # delete old child nodes $::tree delete [$::tree nodes $node] # Directory of nodes set path [$::tree itemcget $node -data]
foreach child [glob -nocomplain [file joinU $path *]] { if {[file isfile $child]} { set icon [Bitmap::get file] set dc never } else { set icon [Bitmap::get folder] set dc allways } $::tree insert end $node $child -dataU $child \ -text [file tail $child] \ -image $icon -drawcross $dc } } proc nodeClose {node} { $::tree itemconfigure $node \ -image [Bitmap::get folder] } proc selected {node} { $::tree selection set $node set path [$::tree itemcget $node -data] set ::fileInfo "[file tail $path], [fileU size $path] bytes" } # insert first node $tree insert end root pwd -data [pwd] -text [pwd] \ -image [Bitmap::get folder] # ... and open $tree opentree pwd false wm title . "FileBrowser"
# Create nodes for all children
and displays the name and size of the file by reference to the “::fileInfo” variable (the variable is displayed in the status line, see line 19).
Good Reasons In addition to the features already mentioned, BWidgets also contains a
INFO [1] BLT: http://incrtcl.sourceforge.net/blt/ [2] BWidgets: http://tcllib.sourceforge.net [3] Tkdnd: http://www.iit.demokritos.gr/~petasis/ [4] Tcl: http://www.tcl.tk/software/tcltk/8.4.html [5] Active Tcl: http://aspn.activestate.com/ASPN/Downloads/ActiveTcl/ [6] Presentations: http://www.t-ide.com/tcl2002e.html [7] TK3D: http://www.gm.com/automotive/innovations/rnd/TK3/TK3D_Software_Description.html [8] Icon: http://www.satisoft.com/tcltk/icons/
progress indicator, as well as password, font and color dialog boxes. Additionally, you can use “DynamicHelp” to define help texts for menus. “Dialog” can be used as a template for dialog boxes of your own. Although some add-ons are already obsolete there are several good reasons for using BWidgets: It allows easy programming of modern GUIs which you would be hard put to achieve working only with native Tk elements. ■
THE AUTHOR
that occurs for the text or the node icon. The callback function is responsible for raising the selected element. The “$tree selection subcommand” with the subcommands “set”, “get”, “clear”, “add” and “remove” takes care of this. The “selected” procedure (line 56) points the selection to a specific node
Carsten Zerbst works for Atlantec on the PDM ship building system. He is also interested in Tcl/Tk usage and applications.
www.linux-magazine.com
October 2002
63
PROGRAMMING
C Tutorial
C: Part 11
Language of the ‘C’ In this article, Steven Goodwin takes us on a journey which results in us breaking our project into pieces! BY STEVEN GOODWIN
T
he pre-processor is a small preparation language that runs before the main C compiler and amends the given source by performing tasks such as conditional compilation, macro substitution and file inclusion. Its integration with the C language is so tight, that within the Linux environment they’re no longer separate programs! What follows will give details of the available commands (known as directives) and how they’re useful to C.
White Ladder Any line that starts with a hash (#) is intended for the pre-processor, including our beloved #include that starts so much of our code. Some people will use whitespace between the hash and the word include for indentation. Others will use a space before the hash symbol to indent. Either is acceptable under most modern compilers, including GCC. Pre-processor directives must appear as the first thing on a line, but can be anywhere within a file, even in the middle of functions, but we place them at the top of the file. This makes sure everything in the file is affected, since the position is important: a pre-processor directive that appears half way down the file will only have effect for the second half of the file.
New Life Include has been our friend since the first instalment. It incorporates header information into our source file, allowing us access to common structures without
64
October 2002
us re-inventing the proverbial wheel each time. Since your fingers are probably tired of typing ‘#include’ I shall include only a minimum of examples! When a file is included, the contents of that file are incorporated into our source verbatim – whether it’s genuine C code or not. There are two variations, one of which we haven’t covered yet. #include <stdio.h> #include "stdio.h"
In the first case, the pre-processor looks for a file called stdio.h in the usual places (/usr/include, /usr/local/include and so on). No surprises there. However, with the second example it will look in the current, local, directory for a file of that name. If no file is found then it will not look anywhere else. This lets us build up our own library of commonly used routines and place them within our own home directory, without needing root priviledges to install them into /usr/include. You can use absolute or relative paths within the file name. Relative paths are interpreted from the file’s directory. If file A includes file B in a different directory, then any include in file B must be relative to the directory in which B resides. Absolute paths are rarely used because porting becomes more awkward. It is possible to include a file more than once (and even include one header from inside another) without a problem. However, since anything inside the header gets included twice, the compiler will see two (or more) implementations of certain structures and complain. To get around this, all header files are guarded. We’ll see how this works shortly.
ParkLife Before we march on; two general pre-processor features. The first of which
www.linux-magazine.com
involves the use of comments: they are ignored. The pre-processor understands C style comments, ignoring them completely as the compiler would, and so will not be interpreted in any directive (such as include). #include <stdio.h> /* the pre-processor U can't see me! */
The other feature is line continuation. Although it is unlikely that you will ever need to split the include instruction over two lines, it is possible to do so by using a backslash as the very last character on a line (which include whitespace). This will cause the pre-processor to rejoin them internally. #include \ <stdio.h>
Again, this is common, and works for all pre-processor directives, not just include.
Mack the Knife Macro substitution is performed with the #define command. It can be used with or without parameters. Like functions, if parameters are used, then the same number of parameters must be supplied for the macro to be expanded properly. #define TRUE #define PI #define SQUARE(x)
1 3.1415926f ((x) * (x))
In each case above, the pre-processor works through the source code and (blindly) replaces the macro name on the left, with the macro text on the right. It does nothing more complex than that! By convention, names are always upper case, which minimizes the chance of it conflicting with a variable name or
C Tutorial
function (written, by convention, in lower case). The macros do not get expanded within quoted text, but will work within expressions. That means you can write: iBiggerNumber = U SQUARE(iBigNumber);
Which the pre-processor will expand to: iBiggerNumber = U ((iBigNumber) * (iBigNumber));
This introduces a couple of interesting (and so often quoted) problems with macros that can quite easily break code. Or rather, code can be written that can quite easily break the macro! Consider the following: iBiggerNumber = U SQUARE(iBigNumber++);
Since the ++ is inside the macro it too will get substituted thus: iBiggerNumber = U ((iBigNumber++) * U (iBigNumber++));
This causes iBigNumber to grow by two (something that was not expected) and iBiggerNumber to be hopelessly wrong! This is reason for the (apparently excessive) brackets in the above example. Thinking back to rules of precedence; imagine a case where an operation with a lower precedence than multiplication was performed inside the macro. #define BAD_SQUARE(x) x*x iBiggerNumber = BAD_SQUARE(i+1);
This expands to:
#define DBG(var) U printf("var = %d\n", var);
This example produces an ‘invalid lvalue in assignment’ error, whereas a function with the same name would work without any problem. Generally, when a macro is trying to look like a function it should mimic a function as close as possible. That means it must look like an expression; no statements (if, while, for) are allowed (since statements can not be part of an expression), it must return a value, and it must have no side affects. It should never end with a semi colon either, as you will also produce odd syntax errors that are difficult to track down! Often, if there’s a problem with code (either during compilation, or at run time) within two lines of a macro, it’s highly likely that it is malformed, and correcting the macro (or better still, using a function) will fix the problem. So why use macros at all? Well, functions require formal parameters. Macros do not. And so it is unnecessary to write (say) 5 different functions to implement one algorithm. The traditional examples at this juncture are the macros for minimum and maximum. #define MIN(x, y) U (((x)<(y)) ? (x) : (y)) #define MAX(x, y) U (((x)>(y)) ? (x) : (y))
the
iBiggerNumber = i+(1*i)+1;
And will not produce the correct result (unless ‘i’ just happens to be 0.4142135623731 or 2.414213562373!). In some cases the code will cause a compiling error.
And since the pre-processor will substitute the value of ‘var’ into the replacement string, this following example will not work either. Sorry! #define DBG(var) U printf("%s = %d\n", var, var);
The solution is to use the macro’s name, as opposed to the value of the macro with the special “stringizing” operator, which is done by prefixing the macro name with the hash symbol. #define DBG(var) U printf("%s = %d\n", #var, var);
It Takes Two The pre-processor appears to have a fascination with the hash symbol, since this final macro feature makes use of two of them! It is called the token-pasting operator, and will join the names of the macro parameters into one. #define PASTE(a,b) a##b int g_Value = 10; printf("Value = %d\n", U PASTE(g_, Value));
Which would expand to: printf("Value = %d\n", g_Value);
Since this can work across types (the minimum of a short and int, for instance), it can save a great deal of work as one simple macro does the whole job. It is possible to add comments to the end of the definition, as they will not be included as part of the macro.
Groovy Train
iBiggerNumber = i+1*i+1;
Which evaluates to (check precedence table if you need to):
iBiggerNumber = BAD_SQUAREU (i=iBigNumber); /* ERROR */
PROGRAMMING
On occasion, you will want to define a macro that can use its parameter as a string, and not a value. The most common case is listing variables during debugging. To save typing lines like: printf("iCount = %d\n", iCount);
The obvious solution (below) does not work, however, since the text inside quotes does not get replaced.
On the surface there might be little use of this esoteric feature, but constructing large structures with rigid naming conventions can be made easier by using token-pasting. It is, however, not for the faint of heart!
Policy Of Truth It is not possible to define a macro twice with the same name as you will get a warning. Actually, you get two warnings. One saying ‘NAME’ redefined, and a second telling you the location of the previous definition. However, with common macros such as TRUE and FALSE (which could have been implemented in any number of different header files), this can be tricky. There are two ways around this. The first is to remove the definition before adding a new one.
www.linux-magazine.com
October 2002
65
PROGRAMMING
C Tutorial
#undef TRUE U /* removes TRUE from the U pre-processors memory */ #define TRUE 1
The second, often better, way is to check for an existing definition. This is done with the pre-processor instruction #ifdef. When a macro is created, its name goes into a table held inside the pre-processor, along with its macro replacement text. You can query individual entries at any time by using the instruction: #ifdef TRUE U /* TRUE has been #defined U somewhere */ printf("TRUE has already been U defined"); #endif
The first line of code starts a block of code that will be compiled in, should the appropriate condition be met. This block starts from the line after the #ifdef, and continues up to an associated, #endif command. It can be said that the block was conditionally compiled in by TRUE. #ifndef TRUE /* TRUE has not U been #defined somewhere */ #define TRUE 1 #endif
This example will compile if the macro had not been defined. This is usually used for selecting build variations and guarding against repeat definitions.
My Definition The #ifdef directive can be used to switch in (or out) special sections of code depending on the build you are doing. For instance, you might have lots
of debugging messages that appear on-screen to help track the program; data you wouldn’t want visible in the final product. So, you could create a macro and place #ifndef around that particular code, like so. #define RELEASE_BUILD #ifndef RELEASE_BUILD printf("Stats output...\n"); /* and so on... */ #endif
For more involved compilations we have a straightforward #if command which supports some basic operations. This will perform some evaluation on macros and support compound expressions. It will not, however, work with strings – only integers. If you want to use strings, then do as we do for DEBUG_LEVEL below – define (and use them) as integer constants. Basic mathematics can be performed with #if, however only signed arithmetic is supported. #define DEBUG_LEVEL 3 #if DEBUG_LEVEL > 2 printf("Complete network U log:\n"); #elif DEBUG_LEVEL > 1 printf("General stats:\n"); #elif DEBUG_LEVEL > 0 printf("Basic stats\n"); #endif
We can also make use of the ‘else if’ (#elif) instruction which is fairly fimilar to C’s ‘else if’ statement, although we could use #else since both are valid. The expression part of a #if may also use a special psuedo-function called ‘defined’, which returns TRUE (i.e. 1) if the macro in question has already been #defined by the program.
TABLE 1: MACROS __USE_GNU
If you define this macro before including header files like string.h you will have access to special GNU-specific extensions.
__USE_ISO9X
Defining this macro gives access to functions that didn’t become part of the standard C library until the ISO C 9x standard was ratified.
__DATE__
The following three macros are created automatically and are standard across all compilers.This one contains the current date as a string.
__FILE__
The current file, as a string.
__LINE__
The current source line, as an integer
66
October 2002
www.linux-magazine.com
#ifdef RELEASE_BUILD #if defined(RELEASE_BUILD) U /* both are exactly the same */
The latter is often used when several conditions need to be tested, such as: #if defined(RELEASE_BUILD) || U !defined(DEBUG_NO_OUTPUT)
For other macros, see Table 1: Macros.
Buffalo Soldier The other main use of #ifdef, guarding, can happen in a number of places. In the simple case above, we can guard against the TRUE macro being declared more than once with: #ifndef TRUE #define TRUE #endif
1
Not an uncommon sight in header files across the land! On a larger scale it can also be used to stop header files from being included more than once, as I mentioned above. These so called internal guards have already been placed in the headers within GCC. If you’ve ever wondered why we don’t get problems when compiling with any combination of headers, this is it. Every header file (and this applies to all header files, not just the GCC ones) should be guarded internally by using a template such as:
Pragma One compiler directive you may see is the pragma.This allows the compiler writers to include features and extensions that are not part of the language, but may be useful (or necessary) on the target platform. On some platforms a pragma might exist to pad structures to a specific size.The language does not provide such a mechanism, but for interfacing with specific hardware it might be essential, and so is provided as a pragma. #pragma pack(32) U /* pack subsequent structures U to 32 byte boundaries */ If the pre-processor and compiler (since the compiler may need to do something with the pragma information) can not interpret what is meant by the pragma it is ignored, to ensure portability. Generally, though, you should not need it.
C Tutorial
#ifndef _STDIO_H #define _STDIO_H /* Usual header stuff U goes in here */ #endif /* this is the last U line of the file */
This stops the stdio.h header from declaring its macros, structures and function definitions more than once; regardless of who types #include <stdio.h> at the top of their file! It is also possible to create an else branch with #else, but it is not needed here. Users of ‘other operating systems’ might try to influence you with easier methods like ‘#pragma once’. Ignore them! This pragma is non-standard, non-portable, and highly unlikely to find its way into gcc anytime soon, so stick with the better method outlined here! For the use and purpose of pragma please see the Box: pragma. It is also possible to guard externally, by including the #ifndef lines around the call to #include. In practice, this is usually more trouble than it’s worth. #ifndef _STDIO_H #include <stdio.h> #endif
This works in the same way as internal guards (it still needs the #define _STDIO_H inside the stdio.h file) and naturally requires the names to match. The rationale with this method is that by guarding externally you save compile time because you do not need to open the file only to realise you do not need anything inside it. The time saved, however, is fairly small, especially under GCC which is intelligent enough to be aware of internal guards in a file, and will not open a header that has already been included. So now we know to stop one file being include twice – let’s split a project into several files and test the theory!
Separate Lives Let’s pretend the temperature conversion project is to grow from a 50 line shell utility to a fully-fledged interactive application! This means we should split it into several sections, making it easier to work with (since the files and compile times will be shorter, and it’s quicker for
PROGRAMMING
different people to Table 2: Modules patch). Our first task is Module Source File Header File to modularise it. That is, Core handling code (main) converter.c converter.h split it into sections that Parsing the configuration file config.c config.h perform a common set of Conversion Process process.c process.h tasks. Common sense Displaying Results output.c output.h and an understanding of Debug Output debug.c debug.h the problem are all that’s necessary here. Looking this particular order, it is better to group back to our converter code we can consistently since it makes the file determine several logical units. Each neater, and easier to read. module is then given its own file and an Line 15 re-introduces extern. This is associated header file. short for external, and can be used to The source file contains all the code to prefix either variables or functions and fully implement that module, whilst the means that the actual declaration for this header files (similar to our friends, variable or function exists outside this stdio.h and stdlib.h) act as a go-between file. The variable g_pAppname is in for the different pieces of code in our converter.c (the equivalent source file), program. Each header makes specific and is where the memory for the pointer functions and structures available to is created. If line 15 omitted the extern code (in any source file) that wants to storage class, we would be creating a make use of it. To use a particular new variable every time we included the function, a source file need only include header file, creating problems later on this header, and it can use it as if it were one of its own.
Code in Headers 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
#ifndef _CONVERTER_H #define _CONVERTER_H #define MAX_CONVERSIONS 1024 typedef struct sCONVERTER { char szFromUnits[32]; char szToUnits[32]; float fMultiplier; float fAddition; /**/ struct sCONVERTER *pNext; } CONVERTER; extern char *g_pAppname; #endif
Lines 1, 2 &17 form an interior guard (as we saw earlier), so anything between lines 3-16 will only be included once. Being a header file it can be included anywhere, and features information (defines, structures and external variables, in this case) that converter.c doesn’t mind the outside world seeing. All headers should be arranged in a uniformed fashion; ‘macros, structures, prototypes’ is a wise choice since prototypes often use structures in their definitions, and structures (in turn) often include macros. Even if you don’t choose
A word about putting code in header files – don’t do it!!! Even with ‘#ifndef‘ around the entire file.The problem comes not from the compiler, but the linker.When compiling, gcc will see a function (our example below uses RandomNumber) once in each source file, which is fine, since the compiler is only working with one source file at a time.The linker, however, is not. It will look at the output from two or more object files (precompiled source files) and try building them into a single program, at which point it will see two ‘RandomNumber’s, get confused, and report “multiple definition of ‘RandomNumber’”in a file called ‘/tmp/ccMLu05R.o’ (or something equally baffling!). The solution is to either define RandomNumber as a macro, or declare only the prototype in the header, and provide the implementation in a separate file that is then linked in.The latter is usually the better solution. sillyheader.h #ifndef _SILLYHEADER_H #define _SILLYHEADER_H int RandomNumber(int iMax) { return rand() / U (RAND_MAX/ iMax + 1); } #endif
www.linux-magazine.com
October 2002
67
C Tutorial
(see Box: Code in headers). We can also add externs (supported by functions and variables) in source files as well, enabling us access variables from other files without including its header.
The Model For converter.c to access the Usage function in output.c (for example), we also use header files. But instead of including the whole function prefixed with extern (which will cause an error – see Box: Code in headers) we need only to include the prototype. output.h (partial) 4 void Usage(void);
We first met prototypes in part one, but have since encountered them in the standard header files such as stdio.h. A prototype tells the compiler that there is going to be a function in the code, but it has yet to appear in the source. This gives the compiler enough information to allow it to be used, pending the implementation. In our examples previously, we’ve always declared the function before using it. This cuts down on magazine space and the need to write prototypes! Now, however, as the functions appear in different files it’s not possible to utilise the ‘declare-before-use’ rule, and so we must include a prototype. And that prototype should live in the header file. Because it is a prototype, the extern is implicit, so there is no need to prefix it with extern. Also, since all source files include their associated header, we don’t need to worry about the ‘declare-beforeuse’ rules; the header comes first, so the prototypes are always before use!
Private Investigations By default, all functions are implicitly externed. Any function can call any other because the extern does not have to be there for it to work. This is not always desirable, as we might want to stop direct access to our own internal functions, so we use static and process.c! static CONVERTER *GetConversionU (const char *pFromUnit)
This means we are declaring a function, GetConversion, but only want it to be
68
October 2002
visible to other functions in this file. This is the same static keyword we used as a storage class. It is not much of a leap to see the connection. So, even if we place an extern in the header, the function will not be visible outside process.c. Prototypes for static functions are therefore placed in the source file, not the header. Finally, because a static function is not visible outside the current file, it is possible to create a different (static) function, with the same name, in a different file. As a postscript I will state the obvious – ultimately, having the source code to a project means there’s nothing to stop you removing ‘static’ from these functions, and externing them to other files. But if there are no methods (i.e. functions) to give you access to this private code, they are probably supposed to be private, and things are likely to break if you mess with them. So if there isn’t a clean way of doing what you want, you’re probably solving the wrong problem and need a better solution!
GOD Although we have not done it here, it is permissible for a header to include another, even recursively. When debug.c needed functions from both converter.h and process.h we had to include two headers. It would certainly have been possible to re-write converter.h to include all the other necessary header files, so each source file would only need one include line. converter.h (variation) #ifndef _CONVERTER_H #define _CONVERTER_H #include #include #include #include #include #include
<stdio.h> <stdlib.h> "config.h" "debug.h" "output.h" "process.h"
#endif debug.c (variation) #include "converter.h" void DbgShowConvTable(void) { ? etc? }
www.linux-magazine.com
Like most things in coding, this has its good and bad points. On the plus side, it’s much quicker and easier to get the project working, since you only need one header file to include, making it less likely you’ll forget something, or need to revisit the file to add new headers as functionality grows. The downside is that it will take longer to compile because every time a header file changes, each source file in the project (that includes that header file) changes as a consequence. Each source file has to combine 6, not 2, headers. Also it will take more effort to manage the inter-dependancies of headers (imagine if A must be before B, B before C, and C before A). The latter can be tricky if you are working with source files in different directories, or moderate to large sized projects. I personally take a spoonful of wisdom from each doctrine, grouping logical sections of header files together (all file handling units, for instance). Files in that group can include the specific headers they need, and those outside can refer to all them at once.
The Land Of Make Believe Now we have our separate modules, we need a way to build them into one project. We can compile with gcc by including each source file as arguments: gcc config.c debug.c U output.c process.c converter.c U -o convunit
That’s quite a buffer full to be typing after each change. What we need is a script to do this for us. But what would be better is a script that would ‘know’ what files had changed, and only compile those. That script is actually a program. And its name is make. And we’ll be looking at that next month! ■ The language of ‘C’has been brought
THE AUTHOR
PROGRAMMING
to you today by Steve Goodwin and the pages 64–68. Steve is a lead programmer, currently finishing off a game for the Nintendo GameCube console.When not working, he can often be found relaxing at London LONIX meetings.
LINUX USER
Ktools
KBear FTP Client
Bear Necessities If you are looking to use FTP for software downloads, specialized client programs offer considerable advantages in comparision to using a web browser. KBear can access several servers simultaneously, for example. BY STEFANIE TEUFEL
A
fter first introducing the GUI based FTP program KBear over 18 months ago in this column [1], it looked as though program development had come to a complete standstill. The latest version (1.2.1) was from April the previous year – and all has been quiet on the bear-front since then. Thank goodness that seems to have changed now, and the first pre-release of version 2.0 sees the program making the jump to KDE 3.0. So, if you want to try out the latest version of KBear from the homepage at kbear.sourceforge.net, you will need KDE 3.0 and Qt 3.0.x. Updating is particularly easy for any Mandrake users who can download a pre-configured package from prdownU loads.sourceforge.net/kbear/kbear-2.U 0alpha1-1mdk.i586.rpm. Users of other systems will need to extract the source code archive using tar -xIvf kbear-2.0alpha1.tar.U bz2
and then launch their compiler (Box 1).
First time out
line in comparison to other FTP clients. Before you get started, this second generation program confronts you with a so-called configuration wizard designed to provide you with support while configuring the program. Keep the default settings in the first two windows iView Settings and Window Settings for the time being, and click on the button marked Next> in both cases to continue. Later you will be able to access any items the wizard presents at this stage, and even call the wizard itself via the Settings / Run Wizard… or Settings / Configure KBear menu items. If the “useful” tips provided when you launch the program start to get on your nerves, you will probably want to access Mixed Settings and disable Run “Tip of today” at startup.. If you forget to enter your email settings in Network/Email in the KDE control center, you can also use this window to inform KBear of your email address. The last wizard window is used for firewall settings (Figure 1). The default value here is Do not use a firewall (most users will use this). You will definitely
want to keep this setting, as the firewall configuration is still experimental. You can then click on the Finish button to launch the FTP client proper.
A Completely New Point of View The first time you start KBear (Figure 2), you will not see the typical two-framed window that you are familiar with from
Figure 2: The initial window
The first time that you launch KBear by typing kbear in your favorite terminal emulation, or via the KDE menu entry Internet / KBear, you will probably notice that KBear steps somewhat out of
KTOOLS In this column we present tools, month by month, which have proven to be especially useful when working under KDE, solve a problem which otherwise is deliberately ignored, or are just some of the nicer things in life, which – once discovered – you wouldn’t want to do without.
70
October 2002
Figure 1: Omit the Firewall in the Alpha-Version
www.linux-magazine.com
Figure 3: Free choice of FTP servers
Ktools
other clients, where you have your home directory on one side and the FTP server on the other. KBear uses a different approach even here. You will still find your home directory as expected, but most of the window is reserved for the various FTP servers that you might want to access in future.
Outgoing Bells and whistles are fine but an FTP client is no use, unless you can convince it to talk to FTP servers of your choice. KBear provides you with two ways of doing this: You can click on FTP / Quick Connect (or use the keyboard shortcut [Ctrl-N]) and type the address of the required server in the dialog box that then appears (Figure 3). You will normally want to access public FTP servers where you do not possess an account. In this case you will need to check Anonymous Login to access that part of the server reserved for anonymous logins. After you enter your email address via the wizard, KBear automatically uses this email address as your password for anonymous login – and this is what most FTP servers expect. Note: Servers will sometimes prompt you for a password despite anonymous login. In this case you should type anonymous in the dialog box that pops up on your screen – after doing so you should be able to log on without any difficulty. Are you ready? Just one more click on
GLOSSARY FTP: The “File Transfer Protocol”controls the transfer of files from one computer to another across the Internet. FTP allows comprehensive file management after logging on to the host system. Depending on your access privileges, you can transfer, delete, copy or move files and directories. Header file: Files ending in .h contain descriptions of how to call functions written in the C or C++ programming languages. They are essential for compiling programs. If you use a package manager to install a library, they are normally found in a separate archive with the “dev”or “devel”suffix and need to be added specifically. Account: Access privilege for a Unix computer. A user who possesses a user name and password for this system and also has a home directory based on this system, is said to possess an account on the host.
LINUX USER
Compilation Tricks and Traps Alpha versions always have a few rough edges – this is particularly true of the installation and defnitely the case for KBear: Extreme caution and some resilience to setbacks should help. The default setting for KBear will place the client in the /usr/local/kde/bin, /usr/local/kde/lib directories, etc.This would mean the program crashing immediately on Red Hat 7.3.The solution is to install the program in the directory reserved by Red Hat for KDE under /usr.To do so, we simply called ./configure in the source directory kbear-2.0alpha1 and added --prefix=/usr. If you have several KDE and Qt versions (including header files), you will need to point the variables in KDEDIR and QTDIR to the correct version in the KBear source directory before you call configure, as in: export QTDIR=/usr/lib/qt3 export KDEDIR=/opt/kde3 The right directory differs from system to system. But even if configure seems to work, you should not assume you have successfully compiled the tool.The compilers on SuSE 7.2 and Qt 3.0.3 refused to cooperate and instead issued a message to the effect of: kbeardeletejob.cpp: In method `void KBearDeleteJob::slotEntriesU (KIO::Job *, const KIO::UDSEntryList &)': kbeardeletejob.cpp:121: implicit declaration of function `int U assert(...)' make[3]: *** [kbeardeletejob.lo] Error 1 If a compiler complains about an implicity declared function, it normally means that the programmer has forgotten to include a header file. If you add the following line at the beginning of the kbearlistjob.cpp, kbearcopyjob.cpp, and kbearfilecopyjob.cpp files in kbear-2.0alpha1/kbearU /base #include <assert.h> you should be able to compile the KBear sources without any problem.
the Connect button (provided you are online) separates you from opening a connection to the server of your choice with a little help from KBear. The Save to Sitemanager box in Figure 4 gives you an idea of the second method of contacting FTP servers. Björn Sahlströhm, KBear’s developer, used the Sitemanager to organize a whole bunch of FTP servers – neatly and tidily by type and content – including their logins. If you decide to access one of the servers in
this list, you simply click on the list entry. You can use the above mentioned box to add entries to the list.
Easy to import
KBear 1.2.1 veterans will be familiar with [Ctrl-O] as a shortcut to opening the Sitemanager configuration. The current version now provides the additional enhancement of allowing you to import FTP bookmarks from other programs. As you can see in Figure 4 this dialog box not only contains a bunch of nicely organized servers but also an Import button. A single click on this button allows you to use the dialog box that pops up (as you can see in Figure 5) to integrate your bookmarks from KBear 1.x, the GNOME FTP program, the ncFTP command Figure 4: Add a server of your choice to the Sitemanager!
www.linux-magazine.com
October 2002
71
LINUX USER
Ktools
Figure 5: Thanks to the new import function, you can keep your bookmarks
line tool, WS-FTP and the WindowsCommander from the Windows environment (which no doubt took years to compile) into KBear 2.0. Of course you can open connections manually in the KBearSiteManagerBase window. To do so, simply select the FTP server you need, click on Connect, and you’re up and running. This is useful for users who prefer a more sophisticated approach: The protocol option allows you to select the more secure sftp variant from the Secure Shell family instead of simple ftp. You can also click on the Advanced… button to define additional options, such as the number of retries, or toggle Passive Mode (Figure 6) on and off.
/ Configure KBear / Views to view the files in the folders on display as icons or as a detailed list including rights, timestamps, and so on. This is also the place to opt for a tree view, to display the taskbar and the like. After configuring the environment you simply use drag & drop for your downloads, as one would expect from KDE. Just click on the file you want to download from a directory on the FTP source server and drag your mouse to the target directory on your local machine. Let go of the mouse button and select copy in the context menu that then appears –
finished. Now you can lean back and watch the transfer progress indicator in the bottom frame (Figure 8). You can even take a peek at a file before you download it. To do so, rightclick on the file and select View in the drop-down menu. Depending on the file type KBear will then offer you a selection of programs to view the file contents in advance and at your own pace. ■
INFO [1] Stefanie Teufel:“Different Views”, Linux Magazine Issue 5, p112
Gimme, gimme! After connecting to one or more FTP servers there is nothing to stop you downloading to your heart’s desire. But connections to multiple servers will tend to clutter the main KBear window. To prevent this, you might like to organize the layout of your windows by selecting Window / Tile. If you really want to revert to the look & feel of more traditional FTP clients, the Tile vertically option is probably your best bet. More horizontally minded users can also opt for tile overlapped (Figure 7) or tile non-overlapped. Everyone else will just have to discover their favorite option by trial and error. You can Figure 6: Active or also use Settings passive?
72
October 2002
Figure 7: Multiple servers tiled to give some order to the window
Figure 8: Making progress with KBear. The bottom panel is indicating the rate of transfer and the percentage completed.
www.linux-magazine.com
deskTOPia
are transferring files to a floppy, creating a link on the desktop or a program icon in the Start menu. Drag & Drop is normally just a file operation. If you look more closely, you normally discover a task ideally suited to a file manager.
Jo’s Alternative Desktop: ROX
RISC rocks
Helping Hands
Not satisfied with a window manager and overawed by KDE or GNOME? Just use your window manager to create an environment! BY JO MOSKALEWSKI
Mali Veith, www.visipix.com
L
inux may look back with pride at its ten year history, but there is one thing that immediately strikes you, if you venture off the beaten track of desktop environments such as KDE and GNOME: Linux just was not designed as a quick seller by marketing strategists, but by programmers interested in developing their own ideas based on a trusted workhorse. This can make one or two things seem strange to recent converts. Suddenly you are required to know what files you possess, what purpose they serve and even the best place to store them.
The User Perspective Demands for a more “intuitive” interface increased and continue to increase: computers should be easy to use.
DESKTOPIA Only you can decide how your desktop looks.With deskTOPia we regularly take you with us on a journey into the land of window managers and desktop environments, presenting the useful and the colorful viewers and pretty toys.
LINUX USER
Normal users are not interested in files but in the availability of the information they require. Of course there is some argument as to what constitutes effective use of a (Linux) computer ranging from the purists’ prompt-based approach to banning use of the keyboard in favor of the mouse. In the long run the Unix principle of “Everything is a file” remains. If you need to access a floppy drive, for example, you simply look for /dev/fd0 and can access the medium just like a single file. Also system information is always processed in file format, as was evident in our previous deskTOPia article on ProcMeter3 [1]. What this boils down to is that an intuitive desktop must primarily provide easy file manangement – whether you
GLOSSARY Session Management: In the context of a desktop environment this generally means keeping track of the programs a user has launched to provide the user access to a stored desktop configuration after logging off and back on.
The file manager is thus a core component of any modern desktop environment. A file manager with a good range of features can supplement a window manager and hopefully provide a desktop environment that supports Drag & Drop, coordinated file management, session management and desktop icons. deskTOPia has already looked into two special file managers of this kind: DFM [2] and XFTree, both in the context of our article on the XFce desktop environment [3]. The ROX-Filer, the core component of the ROX desktop, offers fuller features and is easier to use, although this does place heavier demands on the supporting window manager. The window manager needs GNOME compatiblity to leverage the full functionality, although the ROX desktop can be used with more basic window managers.
Tried and Trusted, but not obsolete If you are looking for ROX on the Web, try http://rox.sourceforge.net/. This address should provide you with the latest version. The subscription disk contains the latest stable release, whereas the newer version on the ROX site has been ported to GTK+ 2 and is available as a development release (i.e. intended for developers only and not necessarily stable or bug-free). However, deskTOPia does not typically investigate experimental file managers, so we intend to concentrate on the stable version in this issue… So far ROX has been written and maintained for the most part by Thomas Leonard, who found Linux lacking in some functions that his former operating system – RISC OS – offered. The name actually expands to “RISC OS on X”; the software is freely available under the General Public License (GPL).
Installation Every major distribution includes ready to run packages that can be installed
www.linux-magazine.com
October 2002
73
LINUX USER
deskTOPia
using the distribution’s own resources. But the sources are also available and fairly simple to install (Box 1). ROX is comprised of two packages: The so-called Base package creates the environment required by the ROX package proper. It contains the information required by the ROX filer on MIME types (which you may have read about in the last issue of deskTOPia[4]) and their corresponding icons, and must be installed first. The ROX package proper contains the file manager, ROX filer, which provides complete desktop functionality.
Rules and Regulations After installing both packages you can use the rox command to launch the file manager and open a simple file manager window (Figure 1). Left-click a file to execute it; if the file is not executable, ROX will launch a suitable application and pass the file to it. Clicking on a directory will change to that directory. If you use the center button on the mouse instead of the left button, the ROX filer terminates after performing the specified task. Directories are an
Box 1: ROX from the source To create ROX from the source code you will need GTK+ Version 1.2 or later, including the developer package, (gtk-dev, gtk-devel or similar) and header files.You will also need to install the developer package for the glib library (often referred to as libc6) and the complete libpng library.These components should be available in any recent distribution.The whole installation procedure is extremely user-friendly. Ensure that you are the root user to extract both archives, change to the directories you have just created, and launch the install.sh scripts in those directories:
exception to this rule and will instead be displayed in a new ROX filer window (i.e. directories are linked to ROX). The third (right) mouse key opens a context menu that allows you to delete, rename, show the size, change file rights, assign a different icon, edit the MIME type or select Options…
your changes for future sessions, you should use the Save button to close the window instead of just clicking on OK. ROX filer can perform Drag & Drop tasks, allowing you to drag files from one ROX filer window to another – of course you can drag them to an editor or any other program that provides appropriate functions, or to a program icon. Since moving multiple files in this way would be extremely time consuming, you can not only use the left mouse key to launch or move single files, but also drag a frame around multiple files to move these files as a group (Figure 3).
Individual
Bag of Tricks
The Options window that then appears (Figure 2) provides access to the complete range of ROX configuration options allowing you to select a language, configure the toolbox and even define a maximum window size (ROX filer dynamically adapts to the required size; empty space that occurs when displaying a single file is thus a thing of the past). The Options provide buttons marked OK, Apply and Save: If you want to save
Things start to get interesting when you use ROX filer to enhance the desktop and not just as a simple file manager. ROX provides you with so-called panels, (Start-) bars that attach to one of the four margins of the screen and make icons available. You can attach a panel to each margin, assign various layouts to the panels, and name the layouts. If you want to attach actions and content to the left margin and call this panel main, you can use the following syntax to do so:
Figure 1: ROX filer as a simple file manager
rox -l=main
Should you now need to remove the panel from the left margin, just tell ROX filer to place an unnamed panel at this location: Figure 2: Focussed Configuration – the Options
tar -xvzf rox-base-1.0.2.tgz cd rox-base-1.0.2 ./install.sh tar -xvzf rox-1.2.0.tgz cd rox-1.2.0 ./install.sh The typical ./configure, make and make install commands are not required at this point as install.sh will perform these tasks. Users might appreciate this help, but it does make troubleshooting more difficult if things go wrong. Figure 3: Copying Multiple Files
74
October 2002
www.linux-magazine.com
rox -l=
deskTOPia
Since there is no such thing as an unnamend panel, the main panel will simply disappear.
Construction Work If you have recently edited a panel and used Drag & Drop to place a file from the file manager window in it, your action will have been stored in ~/Choices/ROXFiler/pan_main (you do not need to remember this) and is available for continued use. If you then decide to place the modified panel on the right margin, no problem: rox -r=main
You use l for left and r for right, t to place a panel at the (top) and b to place a panel at the (bottom) margin. If you want to launch a new and empty panel, just create a panel with a different name. You can then drag directories and files of any type to the panel allowing you to utilize it more quickly (Figure 4).
Ranking You have very little influence on the icons in your panels; they are organized from one corner of the panel towards the center – you get to decide which corner and can use your mouse to do so. However, you can change the order: To do so pick up an icon you want to move with the middle mouse key and drag it to the desired position. To completely remove an icon from a panel, use the right mouse key to open the menu (the left key will simply execute the file represented by the icon). If your window manager insists on adding unwanted ornaments to your panels, just add -o to the command for launching ROX (rox -o -t=main). This will not take effect while any ROX windows are open. If your window manager still insists on drawing frames around your panels, although the rox command contains the option -o, you will probably have to live with the fact that the window manager does not permit framless windows.
LINUX USER
Figure 4: Adding StarOffice to a Panel
or less GNOME compatible. This will prevent any mouse actions on the desktop being assigned to the window manager and allow them to be evaluated by ROX instead. However, you may be able to place icons on the desktop, even though it is not GNOME compatible. To enable ROX for your desktop, also referred to as the Pinboard, you can launch the program with the -p flag. You will need to supply a name, which means that you can work with multiple sets of desktop icons:
files ~/.xinitrc (via startx at startup time), ~/.xsession (graphic login via kdm, gdm or xdm), or ~/.Xclients are normally responsible for the initial configuration of your user’s X sessions. And this is typically the place to launch your window manager. You will need to start ROX before this happens (icewm in our example): #!/bin/sh rox -b=main rox -p=desktop exec icewm
rox -p=desktop
Your window manager must permit frameless and transparent windows for this option. If you have compiled multiple panels or icon collections, you do not need to terminate one element before viewing the next element at the same position. Just add the element you require. If the desktop or panel position is already in use, the previous occupant will simply be replaced. To completely remove an element you will need to call an unnamed panel, as previously discussed.
For Ever and Ever If you have come to like the ROX desktop, you will definitely want to launch the desktop automatically. The
The typical & at the end of a command line is not required for ROX – ROX happily retires to the background and releases the shell that spawned its process. But there is more! Even though ROX is called twice (once as a panel and once again for the desktop icons), it only launches a single instance. ■
INFO [1] Jo Moskalewski:“A Thousand Words”, Linux Magazine Issue 22, p70 ff [2] Jo Moskalewski:“Background Menu”, Linux Magazine Issue 17, p74 ff [3] Jo Moskalewski:“XFCE”, Linux Magazine Issue 13, p76 ff [4] Jo Moskalewski:“The Right Type”, Linux Magazine Issue 23, p86 ff
Out of Room? If you need more room for icons than your panels provide, simply use the desktop surface. In this case, you should ensure that your window manger is more
Figure 5: A Complete ROX Desktop
www.linux-magazine.com
October 2002
75
LINUX USER
Out of the box
tHTTPd
tiny Web Need to share a directory temporarily and allow other users browser based access? Setting up an Apache server to do this would be slightly over the top. So why not go for the easy approach with tHTTPd? BY CHRISTIAN PERLE
T
his issue’s “out of the box” is all about a tool with an absolutely unpronouncable name, the “tiny HTTP daemon” – that is a miniature web server. The server program itself weighs in at a mere 60 to 70 kilobytes. There are several installation methods.
Pre-cooked or do-it-yourself? Pre-configured binaries are available for the Debian, SuSE, Red Hat, and Mandrake distributions. However, the tHTTPd package is not pre-installed on Red Hat and its offspring – you will need to use rpmfind to locate it. (ftp:// speakeasy.rpmfind.net/linux/rhcontrib/U 7.1/i386/thttpd-2.21b-fr0.4.i386.rpm). No matter what distribution you use, you can always revert to compiling the sources available at http://www.acme.U com/software/thttpd/thttpd-2.23beta1.U tar.gz, and this adds the advantage that you will automatically have the latest released version. If you opt for installing the binaries, you will need to use the package manager supplied with your distribution – rpm for SuSE, Red Hat and Mandrake, dpkg or apt-get for Debian. Installing from the sources requires just a few more steps: tar xzf thttpd-2.23beta1.tar.gz cd thttpd-2.23beta1 ./configure make
OUT OF THE BOX There are thousands of tools and utilities for Linux.“Out of the box” takes a pick of the bunch and each month suggests a little program, which we feel is either absolutely indispensable or unduly ignored.
76
October 2002
su <I>(enter root password)<I> make WEBGROUP=www install exit
The thttpd executable is placed in /usr/local/bin, or if you used the RPM package in /usr/sbin. Additional files are placed in /usr/local/www (source archive), /usr/local/httpd/htdocs (SuSE) or /var/www (Red Hat).
User and root flavored Instant Soup The easiest way to go is to launch the web server in the directory that you want to share – this is often referred to as instant webserving. You do not need superuser privileges to do so, but you will need to tell the program to listen on a different port than the standard HTTP port 80. In our example we tell tHTTPd to listen on the unprivileged port 4242: thttpd -p 4242
If you do not want to launch the web server in the working directory, but prefer to share a different directory instead, you can stipulate a directory by specifying the the -d directoryname option. Users accessing the server will not be able to see parent directories or the files they contain. If you launch the server with root privileges, it will be allowed to listen on port 80. After you launch the daemon, tHTTPd will drop its root privileges and run as the unprivileged user, nobody. You can apply an additional security measure prior to this phase. If you stipulate the -r flag, the server uses chroot to change to a new root directory that it will not be able to break out of. Figure 1 shows an example of directories shared for browser based access by tHTTPd. The Konqueror web browser was used. File access is provided to /home/U chris/public, the directory where the
GLOSSARY Web server: A service program (server) that allows web browsers to access files via “HyperText Transfer Protocol”(HTTP).The most popular web server is Apache. rpmfind: A search agent for packages in rpm format. http://rpmfind.net/ contains packages mostly for Red Hat and related distributions. Port: A docking position for network connections. Ports are designated by number and many of them are assigned to services by this number. Programs that bind to ports can offer their services (such as file transfer or remote login) on these ports. Unprivileged port: Port up to and including 1023 require root privileges to bind to a service. In contrast, ports 1024 through 65535 (so-called unprivileged ports) can also be used by user processes. chroot: The (“change root”) system call defines a new root directory for a process.The process cannot break out of this directory and is thus isolated from the rest of the file system. Symlink: Abbreviation for “symbolic link”. A symlink is a special file that contains a path (the target referred to). If you attempt to read or write to a symlink, the system will in fact access the target. Symbolic links are created by the ln -s syntax. CGI: Short for “Common Gateway Interface”.This mechanism is used by the web server to create dynamic web page content.You could use a CGI shell script to display details of the users currently logged on to the server, for example.
www.linux-magazine.com
Out of the box
LINUX USER
Figure 2: No access is allowed at this point
Figure 1: Konqueror Access
server was launched. As the server is not listening on the standard port (port 80), the users will need to supply the port in the URL, such as in http://localhost:4242 if they want to access the local web server.
Forbidden Fruit Even if tHTTPd is not running in a chroot jail, it will still restrict accesss to files and directories below the directory it was launched from. Figure 2 shows a user attempting to access the no_charttoppers directory, which represents a symlink to a target outside of the daemon’s home directory. This makes tHTTPd issue an HTTP 403 (“forbidden”) error – in other words access to this directory is not permitted. Double dot .. (parent directory) attacks designed to access the server’s home directory are also refused by the server.
Listing 1: Configuration file myweb.conf # Configuration file for thttpd port=4242 dir=/home/chris/public logfile=/home/chris/thttpd.log cgipat=/cgi-bin/*.cgi
When you launch the web server, you may notice that immediately the program disappears from the terminal where it was launched and retires into the background, just like a well-behaved daemon should. But how do you stop the daemon? You can always use the ps command (which outputs a list of the active processes) to obtain the neccessary details, i.e. the process ID. We additionally use grep to filter the output, leaving only entries that contain the thttpd string, and send the kill to the process ID that we find, in order to terminate the process. chris@camera:~ $ ps axc | grep U thttpd 2752 ? S 0:00 U thttpd chris@camera:~ $ kill 2752
If messing around with ps and grep is not your idea of fun, you can use the -i pidfile syntax to have the server write the process ID to a file when you launch thttpd. In this case, you can use the following command to kill the process: kill $(cat pid-file).
Logbook As you will normally want to know what is happening on your server, you might like to try the -l logfile option to protocol access to the server. In the following example tHTTPd was launched on port
Listing 2: logfile thttpd.log chris@camera:~ $ thttpd -p 4242 -d ~/public -l ~/thttpd.log chris@camera:~ $ tail -f ~/thttpd.log 10.0.0.200 - - [18/Jun/2002:15:03:22 +0200] "GET / HTTP/1.1" 200 50000 "" "Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux)" 10.0.0.200 - - [18/Jun/2002:15:03:27 +0200] "GET /favicon.ico HTTP/1.1" 404 0 "" "Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux)"
4242 and in the shared directory /home/chris/public; the logfile is called /home/chris/thttpd.log. Now let’s take a look at the entries the server writes to the logfile – see listing 2. A host with the IP address 10.0.0.200 has accessed our server twice, using the Konqueror browser in both cases. The first request was to read the start directory. The browser has no way of knowing that / really is /home/chris/U public. The second request attempts to read a file called /favicon.ico, but as this does not exist, the server issues the HTTP Error 404 (“not found”).
Option Sets If you often launch tHTTPd with the same set of options, you can save yourself some typing by placing them in a configuration file. In this case, the only option you will need is -C configfile, in order to assign a configuration file to the server. The entries in the file are slightly different than the command line options. Listing 1 contains an overview of some commands we have already used, and a new option. To launch the server, you simply type thttpd -C myweb.conf. The line with cgipat=/cgi-bin/*.cgi allows tHTTPd to run CGI scripts, provided they are stored in /cgi-bin below the server root directory and assuming the .cgi suffix. Another program in the package, htpasswd, allows you to password protect specific directories: If you use this tool to create a .htpasswd file and then place the file in a shared directory, any users accessing the directory via their web browsers will need to know the stored access credentials. If you intend to use tHTTPd as a permanent web server, local users can add their own homepages by typing makeweb. The best way to find out more about options like this is to read the man pages (man htpasswd, man makeweb) and the online documentation on the tHTTPd website. ■
www.linux-magazine.com
October 2002
77
LINUX USER
Leocad
Lego Toys with Leocad
Brick for Brick L
ego on a PC? That might not sound much of a hands-on experience, but Leocad does allow you to create Lego models and construction plans on your PC. As well as the Windows version there is a slightly restricted Linux version which we tested on a 750 MHz AMD Duron with 128 Mbytes of RAM and on a 133 MHz Intel Pentium I with 32 Mbytes of RAM. The program ran on both systems – a powerful video adapter and an X server to match are the most important hardware features to look for. Without these, the images take ages to display and it is almost impossible to use the program. Leocad requires OpenGL. If you see the OpenGL not supported message when you launch the program, you will need to install this feature from your distribution CD. If you still experience problems after this step, refer to Box 2 where you will find a workaround. Thanks to pre-configured binaries installing the Lego modeller should be child’s play: You simply download the leocad-0.73-update.tar.gz archive from the program website www.leocad.org [1] or from the CD included on the subscription disc and ensure that you are the root user in order to expand the archive in /usr/local/bin. This places the leocad executable in /usr/local/bin/leocad-0.73. You can use ln -s /usr/local/bin/leocad-0.U 73/leocad /usr/local/bin/leocad
to create a link to the program in a directory included in your path. This step will save you entering the complete path each time you launch the program. You will also need the pieces.zip file, which is also available on the Leocad homepage or the subscription CD. The file contains descriptions of Lego building blocks. Use the unzip tool to unzip this file in /usr/local/share/leocad. You should now drop your root privileges and launch the program by typing leocad & at the console.
78
October 2002
You’ve heard of them – everybody has – those little plastic bricks with lumpy bits on top that you can use to build amazing houses, cars, ships and trains. Now Leocad allows you to construct Lego models on screen. BY FRANK WIEDUWILT
If you intend to compile the program from the source code, you will need to download the current version from CVS. Box 1 shows you how to do so.
In the beginning… there were bricks When you launch Leocad you will see something similar to Figure 1 with the main menu and toolbars at the top, the drawing window on the lower left, and the selection list for the individual lego bricks to the right. For easier viewing you may want to display a grid on the drawing area, this will help you keep your bearings while compiling the model. To do so, select View / Preferences in the menu. Then in the configuration dialog that appears
www.linux-magazine.com
click on the Drawing Aids tab and enable the Base Grid option; keep the grid size for the time being. The Axis Icon option in the same tab is also useful. When enabled, this function displays the coordinates in the lower left corner of the drawing, and this clearly shows the current directions of the X, Y, and Z axes. To place a brick in the model, you must first select the required element in the component list and then select Piece / Insert in the menu. The mouse cursor changes to a small cross, and you can then click, or press the [Ins] key to place the brick in the drawing. You can use the arrow keys, or [Page Up] and [Page Down] to position the element. The arrow keys move the brick in the
Leocad
horizontal plane, [Page Up] moves the brick up and [Page Down] moves it down. The current position of the selected element is displayed in the status line. You can use the keyboard shortcuts [Ctrl+Page Up] and [Ctrl+Page Down] to rotate a selected brick (i.e. a brick with a blue frame surrounding it) about its vertical axis. [Ctrl] an arrow key rotates a brick about its horizontal axis. Table 1 provides an overview of the keyboard shortcuts used for moving bricks. To select several bricks, keep the [Shift] key pressed while selecting. To avoid splitting up units, you can group elements by selecting Piece / Group. The array function helps you clone
several bricks on top of or next to one another. Just select the brick you want to duplicate in your draft and call the dialog box by selecting Piece / Array. Use the Count field under Dimensions to define the number of new bricks to create. Use Move under Transformation to decide the number of units to place the new bricks by relative to the last brick. Use Rotate to enter a rotation angle for the bricks.
New Lego People When you are building Lego people, Leocad helps you out with the Minifig Wizard (Figure 2), allowing you to create the character from individual pieces, where each piece can be assigned
LINUX USER
individual color. You can choose the required elements from a list and use the color button to define their colors. To save a design, type a name in the list on the lower left and then click on Save. If you want to change your design later, you simply select its name from the list.
Step by Step This procedure will take you closer to your construction goals, brick by brick. To ensure that actually building the
Box 1: Source Code via CVS Unfortunately, the Leocad homepage does not provide a source code archive that users can download. Instead, you will need to download the source code from the CVS repository at gerf.org. Assuming that you are online, you can use the following CVS syntax to do so: cvs -z3 -d :pserver:guest@gerf.U org:/usr/cvsroot login (Use guest as your password). cvs -z3 -d :pserver:guest@gerf.U org:/usr/cvsroot checkout -r U leocad-0-73 leocad After checking out the sources you can change to the source code directory and start compiling the program by typing make. Finally, copy the executable leocad to a directory of your choice.
GLOSSARY Figure 1: Leocad
OpenGL: A library for displaying and manipulating three-dimensional graphic objects. Binary: An executable program. Link: A reference to a file or directory or in other words, a second name by which you can call a file or directory. Path: The PATH environment variable comprises the directories where the operating system will search for programs or scripts without the user needing to supply the complete path. Source code: A collection of instructions in a programming language that need to be translated by the computer (in the case of compiler languages) in order for the operating system to run them. CVS:The â&#x20AC;&#x153;Concurrent Versions Systemâ&#x20AC;?is a source code management tool for managing code simultaneously authored by multiple programmers. CVS ensures that all the staff working on a project are using the same version of the source code.You can also extract older source code versions from the CVS repository.
Figure 2: Compiling a figure
www.linux-magazine.com
October 2002
79
LINUX USER
Leocad
model will be easier in real life, you can record the construction steps for the model in single steps and replay these steps later. The toolbar at the bottom of the screen contains commands for saving and replaying the building steps. After each step that you would like to record simply click on the right arrow symbol. The number of the current construction step is incremented and displayed in the status line to the right of the Step keyword. If you want to run through the construction steps for the model, just step through your design by clicking on the buttons.
Points of View There are seven different camera positions that allow you to view the model from various perspectives. Choose View / Cameras to access these options. You can also open several views of the same model at the same time. View / Viewports offers a selection of various view modes. The large window shows you the view from the main camera, and the smaller windows show you front, rear and top views of the model (Figure 3). You can use any of these windows for construction tasks – your changes are displayed in all the views. However, working with more than two views of a model does tend to impact performance.
Figure 3: Multiple Views of a Model
Use Solid color under Background to select a single color for the background. Gradient allows you to create a color gradient from top to bottom, where you can choose the top and bottom colors via the left and right buttons. The Image option even allows you to use a picture Background Scenes for the background. The Environment area allows you to If the white background gets too boring inject Fog into the picture – the fog will for you, you can apply a new become more dense as it approaches the background to your current scene via the background. Choose Draw floor to use a “View Preferences” dialog box in the solid green background. Figure 4 shows Scene menu. you a scene with color gradient and Box 2: Possible Issues with OpenGL light mist. The Leocad refused to work on the Mandrake 8.1 system we were using for option Ambient our test, issuing the OpenGL not supported error message instead.To light is also imporresolve this issue you might like to install the latest version of the tant, allowing you OpenGL library Mesa 3D, which is available from the mesa website [2] or from the subscription CD, in the archive file MesaLib-4.0.3.tar.gz.
Use the tar -xzvf MesaLib-4.0.3.tar.gz to unzip the file and then changed to the newly Mesa-4.0.3 directory where you can start to create the program using the ./configure and make commands. Finally, ensure that you have root privileges in order to install the library to /usr/local/lib using make install. To ensure that Leocad can access the library you will need to edit /etc/ld.so.conf so that /usr/local/lib is at the top. Finally, as you are still working as root, you will want to call /sbin/ldconfig and drop your root privileges.
80
October 2002
www.linux-magazine.com
to choose the color for highlighting. The brighter the tone you select, the more intense the lighting that Leocad applies to the scene.
Construction Plans for the WWW To exchange models and construction plans with other Leocad users, you can export your plan to a series of HTML files that document the construction steps for your model one by one. To do so, select File / Export / HTML. The Layout option in the dialog box shown in Figure 5 allows you to decide whether to show all your construction steps on a Single page, or (One step per page) instead. Pieces list allows you to define whether the element list will be
Table 1: Important keyboard shortcuts for moving bricks Action
Key (board shortcut)
Move a brick along the X axis
[Left Arrow]/[Right Arrow]
Move a brick along the Y axis
[Up Arrow]/[Down Arrow]
Move a brick along the Z axis
[Page Up]/[Page Down]
Rotate a brick about its X axis
[Shift-Left Arrow]/[Shift-Right Arrow]
Rotate a brick about its Y axis
[Shift-Up Arrow]/[Shift-Down Arrow]
Rotate a brick about its Z axis
[Ctrl-Page Up]/[Ctrl-Page Down]
Leocad
LINUX USER
Figure 5: HTML Export
Figure 6: HTML Export Options – defining Image Figure 4: Scene with Colored Background
output After each step or At the end. Enter the directory where you want to store the construction plan in the Output directory field. The path must end in a forward slash – if not, the program will store the files in the wrong directory. Click on the Images… button to open a dialog box where you can define the format for your graphics (Figure 6).
Nice View Although the 3D view in Leocad is quite appealing, the models do not look realistic. Fortunately you can render your models with a little help from Povray. Besides the raytracer you will also require the lgeo library, which is available from www.el-lutzo.de [3] or on the subscription CD. Use the unzip lgeo.zip command to unzip the file in a directory of your choice – you will need write access. When you select File / Export / Pov-Ray, Leocad opens a dialog box where you can enter the path to the
GLOSSARY HTML: The “Hypertext Markup Language”is the language World Wide Web pages are written in.
Formats
Lgeo library and Povray, and type a file name for the Povray file you want to create (Figure 7). It is a good idea to save your Povray files in the same directory as Lgeo, since the raytracer may not be able to find the elements it needs to integrate. Due to a few errors in Lgeo you will not be able to use the brand new Povray Version 3.5. to render models. However, rendering should be no problem with the Povray 3.1 Version. The raytracer will display a few warnings, but nothing more serious. To really appreciate the realistic three-dimensional scene, you will need to open the Povray file with a suitable Povray front end, such as peflp.
Why bother? If you enjoyed playing with Lego as a child, you will find it hard to escape from the fascination of designing Lego constructions on your PC. Leocad is extremely addictive! Unfortunately, the Linux version is not quite up to par with the Windows version, which is currently better maintained and at a later development stage. You can use the program on older computers, provided they have a 3D accelerator card, a suitable X server and
Figure 7: Export Options for Povray
enough main memory, but to create models with thousands of components you will need state of the art hardware. There are also some stability issues connected to OpenGL and nVIDIA graphics chips that might spoil your fun, if you have the wrong hardware. Unfortunately, documentation is a different matter: The program is supplied without online help, and only a short tutorial is available from the Leocad website [4]. ■
INFO [1] http://www.leocad.org/ [2] http://www.mesa3d.org/ [3] http://www.el-lutzo.de/lego/zips/lgeo.zip [4] http://www.leocad.org/tutorial/basic.htm
www.linux-magazine.com
October 2002
81
LINUX USER
Regular Expressions
Regular Expressions
Needles in a Haystack You come across regular expressions at every step of the way on Unix systems. But what exactly are they, and how do you use them? BY MARC ANDRÉ SELIG
I
n short, regular expressions are simply placeholders for specific strings. Regexps, as the abbreviation goes, are used for a variety of search operations – for example, in most text editors, as well as in your favorite Unix scripting language. Innumerable special functions and exceptions make regular expressions a slightly dry experience, or even a daunting prospect. If you are still struggling with the basics, I really would like to wish you the best of luck – believe me, it is worth the effort!
Encounters with Regexps If you make regular use of the command line, you will definitely use grep for searching in files. This tool even derives
82
October 2002
its name from regular expressions. The ed command g/re/p prints every line in a file that contains the regular expression re – and that is exactly how grep works. grep requires at least one search expression as a command line argument. The target file(s) for the search operation are supplied as an additional argument. Imagine you want to create a list of all
the users on your system whose name starts with the letter “t” and who use bash as their login shell. You can type grep -i ^t.*bash /etc/passwd
to display them on screen. If you do not supply a file name in the grep command line, the tool simply
TABLE 1: USEFUL GREP OPTIONS Command
Description
-i
Ignore case
-l
Display on the files where the regular expression exists – do not display the the lines of text
-v
Reverse search:Display only lines of text that do not contain the expression
-3
Display the line containing the expresssion and three lines before and after that line
-A 3
Display the line containing the expresssion and three lines after that line
-B 3
Display the line containing the expresssion and three lines before that line
www.linux-magazine.com
Regular Expressions
searches standard input. This is quite useful if you need to process another program’s output. Imagine you wanted to search for the Konqueror processes belonging to the user mas; you could type the following command: ps -auwx | grep mas.*konqueror
The forwards / and ? reverse search functions in the vi text editor can also be regular expressions. However, this would tend to cause confusion in emacs, where incremental search operations are involved. In this case you need to explicitly search for a regexp. The command for a standard incremental search is M-C-s, that is [CtrlAlt-S]. M-x [Alt-X], or searchforward-regexp is slightly more straightforward: In this case you first enter the complete search expression and then perform the search. Some people’s first experience with regexps comes from perl, but most other scripting languages (such as Tcl, Python and PHP) can handle them. There are some lesser known regexp consumers such as the sed or awk languages, but even the C++GUI toolkit, Qt, which you can use to write KDE programs can handle regular expressions.
Building Blocks In the case of simple regular expressions, you merely enter the string you want to search for. You can type most ASCII characters directly: Even a single character can be a search string, albeit a primitive one. Our next lesson is: There are some metacharacters that enable special functions. The most commonly used metacharacter is the period, which is a wildcard for any other character. The regexp be. . will search for be followed by any two characters. This would allow you to find both “bear” and “bean” and even “bed” (followed by a space character). However, what it would not find is a lonely “bed” without the trailing space character. The ^ character searches for a new line. This kind of character is referred to as an anchor as it anchors the search expression at the beginning of a new line. You could therefore type ^bear to search for the word “bear” at the
beginning of a new line. The dollar sign $ does the same thing at the end of a line. So if you want to search for the last bean in a line you would need to type bean$. Since regular expressions are normally evaluated line by line, additional characters prepended to ^ or appended to $ normally make little sense. Languages like Perl sometimes allow exceptions in this case – we will get back to that subject later. Regular expressions are really useful when you use combinations! ^One$ will match any lines that contain only the word “One” and nothing else. You can type ^$ to find empty lines where the new line character is immediately followed by the end of line character. Warning: Constructs such as ^ and $ do not represent characters in the target files but instead find spaces between two characters. grep ^a$ textfile
for example, will output exactly two characters for each match, the letter “a” and a new line, although the regular expression comprises three characters.
Alternatives and Repetitions Sometimes you do not know exactly what you are looking for. If multiple occurrences of a character (or an expression to be more precise) are permissible, you can use one of the following operators: The asterisk * indicates that multiple occurrences of a character are permissible. a* thus represents one or multiple “a”s – or even an empty string that contains exactly zero “a”s and thus complies with the requirement “any number”. Mo*rs thus matches “Moors” or “Mooooooooors” but also “Mrs”. In contrast to this, the plus character + repeats a character – it must occur at least once, but can occur more often. The regular expression jj+n will search for two “j”s, one of which can be repeated, followed by an “n”. Thus this expression will match “jjn” or “jjjjjjjjjn”, but not “jn”. Most of the time repetition wildcards such as + and * are not used in the context of specific characters but to repeat the period that represents any
LINUX USER
other character. The regular expression .* thus represents any number of occurrences of any character, i.e. it represents any string. The regexp Jo.*nes can thus represent any line that contains “Jo” followed by “nes” anywhere in the line – this matches “Jones”, “Johannes” but also “Joe bought some fresh meaty bones for his dog”. .* will always search for a match that is as long as possible. The expression a.*b in the string “abcabcabc” thus matches “abcabcab”. If you do not need this, you can just add a question mark in perl. .*? will search for a match that is as short as possible, so a.*?b will simply find “ab” in “abcabcabc”. You can use a single question mark ? to represent optional occurrences: The preceding letter can occur exactly once or not at. ab?c represents “ac” or “abc”. The option of defining the number of repetitions is more rarely used, and we will be discussing it for the sake of completeness only. If you need this option, you use braces that contain the minimum and maximum counts. a{1,7} represents one through seven “a”s. a{,7}
GLOSSARY Script language: A programming language typically used for authoring (mainly smaller) programs (scripts) that do not need to be converted into an executable format by a compiler, but are interpreted and executed by an interpreter when the source file is called. The most common examples are shells (such as Bash) or Perl. ed: The classic Unix line editor that does not allow you to edit a whole file, in contrast to modern text editors, but provides commands that can be applied to a single line or multiple lines. Login shell: The shell that presents itself to a user when he or she logs on via the command line.The system administrator uses an entry in the /etc/passwd file, which also includes the user name, to specify what file this should be. (Modern Linux systems normally no longer store encrypted passwords in /etc/passwd but shadow them in the /etc/shadow file.) GUI Toolkit: A program library that provides functionality for authoring graphic user interfaces (GUIs), for example, classes for windows, scroll boxes, menu bars.The most common GUI toolkits for Linux, Qt for C++ and GTK for C, also provide a range of classes and functions for other purposes.
www.linux-magazine.com
October 2002
83
LINUX USER
Regular Expressions
means at the most seven, and a{4,} at least four “a”s. You can use the pipe character | to use alternative search expressions. Thus, grep -E '(bus|train|plane)' U vehicles.txt
will only show public transport vehicles. The grep flag -E tells the search tool to expect an “extended” regular expression. Without this option the tool will interpret this as a normal search string.
Character Ranges A character range is an expression that represents multiple characters, e.g. only letters or accented characters. This is not used very often for interactive tasks with emacs or vi, but character ranges are quite useful in scripting languages – for example, you can define a range of valid input characters when you are programing CGIs. Ranges are defined by square brackets surrounding the valid characters. [abc] includes “a”, “b” or “c”. You can combine ranges like the one just mentioned with other metacharacters: [abc]+ thus matches “a”, “aa”, “abababc”, etc. Dashes (alias minus signs) facilitate the definition of larger ranges. [a-z] represents any lower case letter. [a-zAZ0-9] represents any alphanumeric character. If the dash itself is to be included in the range, you will need to prepend the range with this character. The range [-+a-z] comprises lower case letters and the plus and minus signs. If the first character within the square brackets is a circumflex accent ^, the
definition of the range is inverted, i.e. the expression only match strings where these characters do not occur. To search for any characters with the exception of “Z”, you can use [^Z]. If you want to find any lines in a file that do not start with “Y” or “Z” (and are not empty), you would type ^[^YZ]. The circumflex at the start of the regexp indicates a new line which must be followed by any character apart from Y or Z.
Predefined Ranges Most libraries containing regular expressions define short forms for common ranges, and can save you some typing. As you might have guessed from the lack of enthusiasm in the last sentence, you should not expect global standardization … You will find an overview of some of the most important pre-defined ranges in Table 2. The table includes the so-called POSIX character classes, which most engines accept – such as grep or perl. You will also find the perl short forms, which are somewhat cryptic to understand but easier to type. The big advantage of these pre-defined ranges in comparison to homegrown definitions such as [a-zA-Z] is that pre-defined ranges normally allow you to use a locale, i.e. if you work in a French locale, accented characters count as alphanumeric characters. The second big advantage does not really apply to Linux: Theoretically you could have a character set that is incompatible to ASCII, where the letters of the alphabet are out of sequence or not correctly sorted. In this case “[a-z]” might not include all the lower case
TABLE 2: PRE-DEFINED RANGES IN POSIX AND PERL POSIX Character Class
Short form for range in perl
Description
[[:digit:]]
\d
“Number”:a number between 0 and 9
[^[:digit:]]
\D
[[:alpha:]]
“not a number”:anything apart from digits “alpha”:letters (including local accented characters and similar)
[[:alnum:]]
“alphanumeric”:letters and numbers
[[:word:]]
\w
[^[:word:]]
\W
“word”:alphanumeric character or underscore “_”(not in POSIX!) “non word”:not alphanumeric or the underscore
[[:lower:]]
lower case letters
[[:upper:]]
upper case letters
[[:punct:]]
punctuation marks
[[:space:]]
\s
“whitespace”:space character, tab or new line, POSIX includes the rare vertical tab
[^[:space:]]
\S
“non whitespace”:anything apart from space, tab or new line
[[:blank:]]
84
“vertical whitespace”:space or tab
October 2002
www.linux-magazine.com
letters (which it should), but instead include some special characters (which it should not). [[:lower:]] is guaranteed to contain lower case letters only, no matter what character set is in use.
Special Characters You can use special characters (such as umlauts, tabs, control characters, etc.) for most implementations of regular expressions. To do so, just enter them directly as a regexp. If this is impractical (because you cannot distinguish a tab from a space in a program listing), you can use the notation common to C and most shells, for example \t for a tab. If you need to search for a backslash, type the character twice: \\.
Groups Before we get down to practical cases, let us look at another important construct. Repetition characters such as * always apply to the character, or to be more precise, the expression that immediately precedes them. Thus, abc* repeats only the “c”; this regexp will match “abcc”, but not “abcabc”. However, you can use parentheses to group a regexp. (abc) effectively means the same as abc, but the internal workings of the search operation are entirely different: abc comprises three search expressions, “a”, “b”, and “c”, which must be found in sequence. (abc)
GLOSSARY CGIs: Scripts or compiled programs that are stored on a web server, call a specific web page when launched and generate a HTML document “on the fly”(dynamically). CGI is short for “Common Gateway Interface”. POSIX: An attempt to standardize typical Unix functionality and definitions (IEEE Standard 1003.1). Most Linux programs can be made POSIX compatible if required, although this may mean doing without some of the more advanced functionality. Locale: POSIX supports automatic adapting of programs to the local environment. One obvious example would be displaying local translations of system messages or man pages.The locale also includes local formats for pages, time or date values, measurements, or preferred paper sizes, and of course information the function fulfilled by specific sections of the character set – that is, whether character 196 should represent an “Ä”or a non-printable character.
Regular Expressions
TABLE 3: PERL VARIABLES FOLLOWING A SUCCESSFUL SEARCH Name
Description
$&
The last string found, i.e. what ever matched the regexp.
$` (backtick)
Part of a string before the match.
$´ (Forward tick)
Part of a string after the match. After a successful search the entire search string is split into $´$&$´.
$+
The last matching group. If a regular expression comprises multiple group constructs,where some are optional, you can use this to access the contents of the last group.
contains the group “abc” as an individual search expression to which other functions can be applied: (abc)* repeats the whole group. This expression will match both “abcabcabc” and “abc” or even an empty string, but not “abcc”.
The Taming of the Shell One important reason for the “popularity” of regular expressions is the fact that the incredible confusion that using them can cause. Regular expressions are so difficult to read when they start to get more complex. To prove a point, here is an example from the perlfaq6 man page:
particularly in combination with parentheses to provide group definitions. As we have seen, (abc)+ searches for repetitions of the “abc” string, such as “abcabc” or “abcabcabc”. Unfortunately you often have to enable special characters like parentheses or the plus by adding a slash. If you use perl, the correct regexp would be (abc)+. However, in grep you would need to type \(abc\)\+. egrep or grep -E searches for “extended regular expressions” and understands (abc)+ as is. The shell really likes backslashes – in fact so much so that it eats them for breakfast. Would you like a demonstration? The command :-
/\*[^*]*\*+([^/*][^*]*\*+)*/|U ("(\\.|[^"\\])*"|'(\\.|[^'\\])U *'|\n+|.[^/"'\\]*)
This monstrosity is supposed to find comments in C programs at the same time ignoriing possible comment characters in strings. No, I have not tried it, and no, experts cannot really “read” expressions like that, although they might be able to piece together the eventual outcome. Various other complexities can make your life miserable. Regular expressions are used in thousands of different programs, and of course each program has its own implementation with proprietary features and a small smattering of exceptions. The first trap you tend to fall into is backslashes,
echo \(abc\)\+
displays “(abc)+”. See what I mean, the shell has “eaten up” the backslashes. So you need to prevent backslashes from disappearing into the depths of the shell by adding another backslash. grep \\\(abc\\\)\\+ works, but it might be easier and more readable to add quotes instead: grep ‘\(abc\)\+‘ has the same effect and is clearer.
Differences in Libraries I have already mentioned that different programs will deal with regular expressions in different ways. If you only have to deal with a single tool when authoring a perl script, for example, this probably will not cause you too many
LINUX USER
headaches. But if you switch tools, or use multiple tools simultaneously, you may find yourself facing a few issues. The differences mainly concern two points. Does the program expect a “simple” or “extended” expression? This boils down to the question of whether you need to enable special functions, such as + or parentheses, by prepending a backslash \, or assume that the functions are enabled by default (and need to be disabled using a backslash). Rule of thumb: Most script languages use extended regular expressions, you can thus directly use the functions presented here. In addition to grep we can also use egrep, which understands extended regexps. Most editors and command line instructions however expect simple regular expressions. Generally if something does not work then try again but supplement backslashes at the strategic points. The second issue is not such a big deal in real life situations. Sophisticated special functions are normally private extensions that, of course, will not be available in any other program. Perl, for example, will allow you to place comments in regular expressions and offers special functions for virtual expressions (predictive expressions that check whether a certain string follows the regexp without the text needing to be part of the regexp) – this would make no sense in grep.
Backlinks to Expression As previously discussed, parentheses are used to define groups of characters. So far we have only used these groups to repeat strings. But the genuine task for these groups is completely different: A group can define a substring that you can refer to later. When the program finds an expression in parentheses, it will store
Listing 1: Sample file connections.txt
TABLE 4: MODIFIERS FOR REGULAR EXPRESSIONS IN PERL Tailing the last slash Description
B
by
to P
by
/i
Non case-sensitive search.
10:45
bus
11:52
train
13:05
/s
The period also applies to new lines in the string where you are searching.
10:49
train
11:19
train
12:05
/m
The string you are searching in can contain multiple lines (similar to grep), where ^ and $ always rep
11:45
bus
12:54
bus
15:10
12:45
bus
13:51
train
15:05
13:49
train
14:19
train
15:05
resent the start or end of a line in the string. /g
On searching and replacing,do not stop at the first match, but continue through the whole string.
www.linux-magazine.com
October 2002
to M
85
Regular Expressions
the expression, allowing you to call it in a different part of the program. Example: The regexp (bus|train) matches both a bus and a train. Since the expression is enclosed in parentheses, the software records any matches – “bus” or “train”. The string is stored in a variable with a serial number between 1 and nine for the sake of simplicity. Variable 1 comprises the contents of the first group, variable 2 the content of the second group, and so on. You can use the \1 ff. construct and to refer to these variables. Let’s use the file connections.txt from Listing 1. This contains a list of the connections from Berlin via Paris to Madrid. Now when I’m travelling, I try to avoid walking from the bus station to the train station. So I really just want to view the connections that allow me to travel the whole trip by bus or by train. I can use the following syntax to do so: egrep '(bus|train).*\1' U connections.txt
The regexp first looks for “bus” or “train” and saves the results in variable 1. Any string can follow this (“.*”) provided the content of variable \1 also occurs. So if “bus” occurs, the line must
Listing 2: Output from egrep command 10:49 train 11:19 train 12:05 11:45 bus 12:54 bus 15:10 13:49 train 14:19 train 15:05
also contain a second instance of the word “bus”, in the case of “train” the word “train” must occur twice. The output is shown in the listing above (Listing 2).
Backward References in Perl Perl also handles backward references perfectly. Within a regular expression the content of the groups is stored in the special numerical variables \1 through \9. Additionally, perl stores regular perl variables that you can use to access matches outside of the regular expression. The content of \1 is thus placed in the perl variable $1, and the content of \2 in $2, etc.
86
October 2002
The following is a quick example from a primitive Parser: /^\s*From\s+(\w+)\s+Type\s+U (\w+)\s+Seq\s+(\d+)\s+DB\s+U (\w+)\s*$/i or the "Invalid pattern in U packet: \"$_\""; my ($from, $type, $seq, $db) = U ($1, $2, $3, $4);
Now the variables $from, $type, $seq and $db contain the matching strings. If you type From mas Type ACK Seq 4219 DB U Pharma
thus $from = “mas”, $type = “ACK”, $seq = 4219 and $db = “Pharma”.
Additional Perl Functions Besides groups and the like, perl uses additional variables after performing a search. Table 3 provides an overview. The perl regexp engine is without a doubt one of the quickest available for Unix. A few practical perl one-liners will allow you to save some work on the command line. The following line allows perl to function as a substitute for grep, outputing any lines that include the regular expression abc: perl -ne 'print if /abc/' U filename.txt
Perl can also be used as a substitute for sed. In our example the regexp abc is replaced by the def string and the results are displayed on screen: perl -pe 's/abc/def/g' U filename.txt
The next command is similar, however, here we are replacing abc inside the file def and creating a backup copy in filename.txt~: perl -pi~ -e 's/abc/def/g' U filename.txt
You will normally use m/regexp/ or simply /regexp/ for simple search operations in perl. To search and replace, use s/regexp/replacetext/ instead. You
www.linux-magazine.com
can also use one of the modifiers found in Table 4 instead of the last slash.
Troubleshooting Complex regular expressions rarely work on your first attempt. Troubleshooting regexps costs time and nerves – make sure you have an ample supply of coffee or cola! The first step in the command line should be to prepend the echo command. Remember, the shell might devour some of your characters. You can soon get to the bottom of this issue by displaying the command while it is executing. But even in scripts you would be well advised to display the regular expression, the matches or at least part of these. You should place commands that allow this in your script. The perl debugger permits interactive debugging of the regular expressions. If that does not help, simply split up the expression into smaller sections. Check whether all the parts really do what you intended them to. And most importantly – stay calm!.
Prospects This short article should provide you with an impression of the capabilities and possibilites, but also the complexity of regular expressions. If you want to delve deeper into this field, there is only one way to go: practise. Reading or dissecting complex regexps that you come across in scripts is both instructive and frustrating. You might prefer to keep using regular expressions. The first time you need to perform two similar searches in short succession, you might find yourself wondering if a single regexp would also have done the trick. Also have a look at the grep und perlre man pages and your perl documentation man perltrap and man perlfaq6. ■
THE AUTHOR
LINUX USER
Marc André Selig spends half of his time working as a scientific assistant at the University of Trier and as a medical doctor in the Schramberg hospital. If he happens to find time for it, his currenty preoccupation is programing web based databases on various Unix platforms.
COMMUNITY
Brave GNU World
The monthly GNU Column
Brave GNU World I
n this monthly column we bring you the news from within the GNU project. We aim to give you an insight into the programs and some of their philosophies. In this issue we will look at the ways Free Software is helping in the world of medicine and health. After that we take a look at another alternative to .NET and then finally onto some strategy games to take your mind off work.
Debian-Med Like many areas, the area of medical applications has a large quantity of projects which are sometimes quite advanced, but it is beyond the skills of any “normal” user to assemble them or integrate them into a solution. To solve this, Andreas Tille began the DebianMed [5] project in early 2002, which aims at customizing the Debian distribution for users in the medical and microbiological fields and seeks to integrate software in these areas. Frequent readers of the Brave GNU World might be reminded of the DebianJr. [7] project introduced in issue #23, [6] and in fact the idea for Debian-Med was inspired by that project. Software which needs to interact with the private and the very intimate spheres of human life – as is the case for doctors – has to fulfil certain basic criteria, which currently only Free Software can hope to fully satisfy.
Welcome to another issue of the Brave GNU World. We may always have suspected it, but this issue presents some proof: Free Software is good for your health! BY GEORG C. F. GREVE It must, for instance, be sure that the confidentiality and security of patient data is upheld. This requires a certain transparency, which is best secured through a Free development process. Also safety from data-loss can be very important, because some tests are a hazard to health and so sometimes cannot be repeated. If data gets lost, this is clearly reducing the quality of the medical service. At the same time it will normally cause a loss of trust by a patient concerning the physician. Therefore this area requires a secure, trustworthy and stable fundament (the operating system) with similar applications. Using Free Software is more and more becoming a necessity for every conscientious physician. Not surprisingly, privacy, protection of data, trustworthiness and security are at the top of the list of Debian-Med’s aims. Other core issues are ease of use, easy installation and administration. The program is easy to use thus preventing both errors and frustration which would work against the patient’s interests. Also easy installation and administration makes sure that
conscientious physicians have fewer problems when moving to Free Software, which should be taken into consideration for practical reasons. Currently the project, which consists of Andreas and about 70 more interested people, looks to find a solution for every common problem and to make it installready. The mid-term perspective is to present Debian-Med as a real alternative to physicians and create a demonstration live CD. Areas in most need of help are documentation and translation, but a logo is also still missing. Andreas has been thinking about a combination of Debian logo and snake for this. The license status of the whole project is determined by the individual software licenses, of course. Free Software under a Debian Free Software Guidelines (DFSG) approved license is preferred. Unfortunately the project also plans packaging proprietary software that is distributable at no cost, creating a weakness for the mid- and long-term perspectives. But if enough people express their wish to think in the long-term here, I’m sure that this decision is not carved in stone. Bringing Free Software into the medical area is something that I always considered to be quite important and if you are looking for a useful project to engage yourself in, Debian-Med will most likely be a good choice.
Gnumed
Figure 1: Gnumed prescription dialog
88
October 2002
www.linux-magazine.com
Among the programs used within Debian-Med is Gnumed, [8] an official GNU project designed for a paperless medical practice. The project was born in Australia, where a heated discussion about the
Brave GNU World
Figure 2: Gnumed summary screen
dangers of proprietary software in the health sector took place in March 2000. Physicians refused to base their decisions on non-transparent algorithms. Within this discussion Hors Herb was accused of unconstructive criticism, which he took as a trigger to start working on Gnumed. After a first working alpha-release was presented at the MedInfo2001 in London, the international interest in Gnumed made an total redesign of the internal structure necessary. Implementing this new structure is currently the main task for the project co-ordinations Horst Herb and Karsten Hilbert, who work on this together with about 17 other developers and many volunteers. After completing a minimal version, which they hope to already be useful, it is planned to make Gnumed a complete medical solution which should include decision support. The problems that Gnumed faces on the way to this is a lack of free pharmaceutical databases, different health systems with different regulations, lack of data formats, transfer standards and standardised messaging protocols, as well as lack of a system to create a globally unique ID for a patient. Programming languages used in this project are Python and C/C++ on the client side, PgSql, C and Python on the server side, with reliability and security being the most important paradigms; both of which are not adequately treated in proprietary solutions in the opinion of
the Gnumed team. In the Gnumed team there are many physicians from many different fields, who know what they want, but often not how to implement it. Therefore some more experienced developers would be a very welcome addition to the Gnumed team. Gnumed, which seeks to have an easy, ergonomic and highly configurable GUI, support for different languages and health systems, as well as relative platform independence in the end, is published as Free Software under the GNU General Public License. If you wish to get active in this sector, Gnumed is surely a project to help with.
OIO
COMMUNITY
information about them; although it would of course also be possible to use it for invoices, deliveries or accounts. The OIO-library is a metadata repository, which allows exchanging metadata like plug-and-play web forms or project descriptions between server and client. An OIO user can create or modify forms through a web browser, which is then immediately available to be used for data collection over the web. Later forms can be exported as XMLdata to be transferred into a metadata repository like the OIO library or uploaded to another OIO server. Of course it is also possible to assemble data from different forms into a single dataset that can then be searched/queried over the web with help of logical operations. Although OIO has been used for some time, development is not complete. Among the planned features for future releases is support for wireless PDAs. Plug-and-play protocols will also be supported. Most helpful at the moment would be more users, more feedback and better packaging. At least with the last point DebianMed should be able to help.
The “Open Infrastructure for Outcomes” (OIO) [9] is called the “Search for the holy grail” of data portability by its author, Andrew Ho. Nandalal Gunaratne, Res Medicinae Alesander Chelnokov and others accompany him on this quest. Res Medicinae [10] by Christian Heller is OIO was used for production at the also used by the Debian-Med project. Harbor-UCLA Medical Center in March Together with Karsten Hilbert he works 2001 before being published as Free on making Res Medicinae an extensive Software under the GNU General Public software solution in the medical area. License in August. By September it was To achieve maximum portability, Res managing data of more than 1000 Medicinae is based on Java (API/ patients and since February 2002, it is Swing, Servlets/JSP, JDBC) with some being used as a hospital-wide information system. So it is safe to say that OIO has proven itself already in daily use. The primary components of OIO include the server, which is accessed via any browser through HTML and the OIO library. The server is a flexible, web-based data management system, which manages users, patients and Figure 3: Viewing a complete form
www.linux-magazine.com
October 2002
89
COMMUNITY
CORBA/IDL and SOAP/XML. This already shows the largest problem of this Free Software project under the GNU General Public License and GNU Free Documentation License, because lacking a full-featured Free Software Java implementation, the freedom of the project is in danger. But freedom was a major motivational factor for Christian to begin working on Res Medicinae. He wants to overcome the very expensive and proprietary scene of medical information systems and give users in less privileged countries access to a free, stable, secure, platform independent and extensive system. The project is still rather young. According to the plans, at the end of 2002, the ResMedLib framework should be consolidated and prototypes for two complete modules should be available. In 2003, the administrative module, printing forms and generating reports should work. Afterwards, an image processing and a management tool as well as a billing and statistical module should be added. A training module as well as a decision support module will then finish the whole project. So you should probably not try to use the project in your daily life, but those who are interested to bring medical competence, language translations, Java programming or webpage design into the project, will receive a warm welcome in Res Medicinae. As far as the authors know, Res Medicinae is currently the only Java based GPL project in the medical area and they plan to work together with OpenEMed, a similar Java project under
Figure 5: JavaRisk preparing to battle
a BSD license and the already mentioned Gnumed project to achieve full project interoperability. That should be enough health for today, if there are other projects in this area, I would like to present them in a later issue. An email [1] would be the appropriate way to get this going.
Romance
Romance [11] is the attempt by Bertrand Lamy and Jean-Baptiste Lamy, to give Free Software a real, Free alternative to Microsoft’s .NET. According to Bertrand, their motivation is that Ximian will not be able to deliver a Free implementation of .NET. That Microsoft has already promised to fight all Free alternatives using software patents does indeed make this a plausible scenario. Also standards controlled by those companies without any Free reference implementation always have the advantage that the company is several steps ahead, while the Free projects have many probFigure 4: Res Medicinae forms aid open source doctors
90
October 2002
www.linux-magazine.com
lems following. The situation around Java suffers from this effect. The answer is clear: We need a Free standard with a Free implementation. This is what Romance seeks to provide. The first part – and beginning of development – is Rose, the “Romance Object System rosE.” Rose provides a protocol, which allows for the sharing of objects between the different programming languages. The next step of development will be WiSe, the “Romance Widget Server.” It will be available as a GUI/toolkit library to all Romance applications through the Romance server. The paradigm employed in WiSe is that all widgets remain the property of the WiSe process, and not of the different applications. That should allow Romance to make sharing of widgets very fast and simple. Since Bertrand and Jean-Baptiste believe that 75% of all desktop applications should be written in script languages, they have concentrated on supporting Python, Guile and C first. According to their plans, Rose will also support Perl, Ruby, Lisp, Scheme and other dynamic languages in the future. There are many examples how Romance can be used. For large applications, it is often a good idea to define an expansion language. Instead of choosing one
Brave GNU World
language, a group of objects could be made available through Romance, which allows scripting the application in any language supported by Romance. Networked applications often require a communication protocol, which can be supplied by Romance. Since it is lighter than CORBA, supports more languages than Java RMI and works with dynamic, non-static objects, Romance offers several advantages in this regard. Romance can also act as “glue” between different parts of a program written in different programming languages, providing an alternative to SWIG. Being dynamic, Romance automatically makes sure that the interface is available in all of the “Romantic” languages. Through WiSE, Romance also provides widgets for a graphical user interface, which can be shared in a lean and efficient manner between different processes. This makes linking against graphical toolkits unnecessary and allows the user to choose the look & feel of all applications through Romance. While offering all these possibilities, Rose is very small, it has less than 500 lines of code in Python/Scheme. Although this project, which is released under the GNU General Public License, is most relevant to developers, it should also give non-developers some interesting perspectives for the future.
JavaRisk As an add-on to issue #39, [12] in which some Risk clones were presented, I would like to introduce JavaRisk [13] by Christian Domsch, Sebastian Kirsch and Andreas Habel. JavaRisk is an implementation of the well-known board game Risk in Java under the GNU General Public with the rules based entirely on the German version of the game. Despite JavaRisk being a computer game, the authors did not implement any network support or artificial intelligence. It has outstanding graphics, though, which are so good that the game J-TEG introduced in issue #39 implemented them as a theme. JavaRisk is a typical student project, which means that the three authors did not stop playing while their professor was around.
Figure 6: Using XPM chess pieces from Xboard. The tarball includes 30 different sets
When he noticed the game, he immediately loved it. He suggested that they should write an artificial intelligence for the game as their 5th semester student project. Also, since he is a fan of everything Asian, he asked them to implement small animated Samurai-fighters to be displayed whenever China or Japan are being attacked. Currently Christian, Sebastian and Andreas are working on a new version of JavaRisk, which will have more resemblance with a strategic war game like Empire. JavaRisk v2 will support networked play right from the start.
EmacsChess Also in reaction to issue #39, Mario Lang wrote to me and recommended writing about the Emacs Chess [14] project by John Wiegley. Emacs Chess consists of three major parts. The first part contains the
COMMUNITY
display/front-end capabilities in order to display different types of boards in Emacs. The second part allows communication with different chess engines like GNU Chess and Crafty. The third part is a library for positions and games including a validity checker for moves and game-database management. Among the very neat features of Emacs Chess is that the Emacs IRC Client (ERC) [15] supports Emacs Chess, so it is possible to start a game with somebody in IRC if that person also uses Emacs Chess and ERC. Since the IRC chess protocol is based on CTCP, it is possible to implement a compatible functionality in other clients. As Emacs can also run on a console, Emacs Chess also provides a nice Chess front-end for vision-impaired users, who can have moves announced in a “knight takes a4” form. Of course non visionimpaired people may also use this very neat feature. People not using Emacs will probably not start doing so because of Emacs Chess, but it should truly be a worth a try for all of Emacs friends. Emacs Chess is written in Emacs-Lisp and published under the GNU General Public License as betatest software, so you may expect some minor bugs.
End Enough Brave GNU World for this month. Ideas, suggestions and comments for interesting projects are as always welcome to the usual address. ■
INFO [1] Send ideas, comments and questions to Brave GNU World: column@brave-gnu-world.org [2] Home page of the GNU Project: http://www.gnu.org/ [3] Home page of Georg’s Brave GNU World: http://brave-gnu-world.org [4] “We run GNU”initiative: http://www.gnu.org/brave-gnu-world/rungnu/rungnu.en.html [5] Debian-Med home page: http://www.debian.org/devel/debian-med/index.en.html [6] Brave GNU World issue #23: http://brave-gnu-world.org/issue-23.en.html [7] Debian-Jr. home page: http://www.debian.org/devel/debian-jr/index.de.html [8] Gnumed home page: http://www.gnumed.org/ [9] “Open Infrastructure for Outcomes”home page: http://www.txoutcome.org/ [10]Res Medicinae home page: http://resmedicinae.sourceforge.net [11] Romance home page: http://savannah.gnu.org/projects/romance/ [12] Brave GNU World issue #39: http://brave-gnu-world.org/issue-39.en.html [13] JavaRisk home page: http://sourceforge.net/projects/javarisk [14] Emacs Chess home page: http://emacs-chess.sourceforge.net [15] Emacs IRC Client home page: http://sourceforge.net/projects/erc/
www.linux-magazine.com
October 2002
91
COMMUNITY
The Linux Beer Hike
LinuxBierWanderung 2002 – Doolin, Ireland
Hack the Cliffs of Moher It is getting to be somewhat of a tradition – the fourth annual Linux Beer Hike took place in August this year. After Pottenstein (Germany), Coniston (England) and Bouillon (Belgium) 91 members of the Linux community from all over the world met in Doolin (Ireland) this year to hack, hike and imbibe that old liquid magic. BY HEIKE JURZIK
T
he arrival of the first hackers on Saturday afternoon immediately saw the first data packages streaming through the hall. In next to no time the wiring was laid, screws were tightened up, tools installed and soon, fingers
were flying over the keyboards. And then disaster struck: The ISDN line in Russell Community Centre refused to work and there was no chance of changing that before Monday morning, as there were no technicians available. But the LBW
would not be the LBW, and geeks would not be geeks if they had not got that fixed in a matter of minutes. Instead of attaching the LBW, which had spread out to occupy most of the village, via the hall the network specialists installed an aerial in the “Activity Lodge” where some of the participants had set up camp. A wireless LAN based in the Lodge was then used to attach the Hall, the campsite and a house to a working ISDN line. In the Russell Community Centre itself a PC with a wireless LAN card was put in the window (for better reception) and used as a gateway. However, Irish hills and valleys were in the way of a direct link to the campsite. So the PC in the Hall window had to talk to the “Activity Lodge”, which in turn relayed the signal to the house with the ISDN line and to the campsite. This meant that people from the Lodge and the campsite (and probably most of the local Doolin residents) could access the Internet or the Hall network via wireless LAN. Of course a setup like this had to be celebrated – and there were enough pubs in the village of Doolin to allow a “Pub of the Day” to be announced daily. The evening of the first day saw a crowd of exuberant Linux fans invading a room at “McGann’s” with cuddly toys and pint glasses – and although some of the participants were suffering from jet lag following up to 30 hours of travelling the party was on.
Geeks just wanna have fun! The locals soon got used to the penguin toting weirdos and entered into the spirit of the event, dropping in to the Russell Community Centre to chat, have a beer or even take a look at Linux. The LBW is addictive and by the end of the week at least one local PC was declared a Windows free zone. Newbies attending the event were in good hands, and users who had been thinking about moving to a new distribution or even a different operating system soon found willing
92
October 2002
www.linux-magazine.com
The Linux Beer Hike
Figure 1: Happy Hacking
helpers. Although, I did notice some weird rituals going on in the kitchen in the middle of the night, purportedly to appease the new Linux versions… Thanks to a generous donation by the Irish Linux Users Group (ILUG) special brews from the region were up for the takings at the Russell Community Centre from Tuesday onwards. The Biddy Early Brewery supplied ample liquid nourishment for the whole week, and after only a few attempts eventers were capable of pulling a perfect pint of stout (Black Tux), ale (Red Tux) or lager (Blonde Tux).
Have Penguin, will travel
COMMUNITY
Figure 2: Sacrificed on the Linux Beer Walk Altar
receivers and went off to explore the West coast of Ireland. A trip to the “Ailwee Caves” saw intrepid penguins getting into potholing, numerous beaches inspired the eventers to build sand Tuxes and the nearby “Cliffs of Moher” saw several spontaneous gatherings. Eventers wanting to avoid all that exercise and fresh air had ample opportunity to visit lectures and seminars on firewalling, network management, grid computing or IPv6. A video projector in the Seminar Room at the Town Hall provided valuable service and was only rarely misused for nightly showings of the “Muppet Show”.
After a week full of Linux, beer and walking most eventers sadly had to leave for home and the daily grind. And although no one is quite sure where the next LBW will take place, one thing is for sure – the penguins will be back! ■
INFO http://www.lbw2002.draiocht.net/ http://lbw2001.ynfonatic.de/ http://www.lbw2000.eu.org/ http://www.lbw2000.eu.org/lbw99/ http://www.linux.ie/ http://www.beb.ie/ http://www.doolin.com/
With all that partying going on, there was no avoiding some mental or physical exercise. And with the weather cooperating most of the time the Linux eventers pulled on their boots, grabbed their GPS
Figure 3: Guinness is good for you
Figure 4: Hacking the Cliffs
www.linux-magazine.com
October 2002
93
Subscription CD
LINUX MAGAZINE
Subscription CD On this month’s subscription CD we start with the latest distribution to hit the servers. Included along side the full distribution we have all the files that we mention in the magazine, in convenient formats.
Debian Woody 3.0
ROX
Debian is a completely free Linux operating system for your computer. An operating system is the set of basic programs and utilities that make your computer run. Debian uses the Linux kernel (the core of an operating system), but most of the basic operating system tools come from the GNU project; hence the name GNU/Linux. Debian GNU/Linux provides more than just a pure operating system: it comes with more than 8710 packages, pre-compiled software bundled up in a nice format for easy installation on your machine.
The ROX desktop is an easy to use (and understand!) graphical desktop. The ROX Desktop is a based around the ROX-Filer, which is small, fast and powerful.
Getting Started This latest stable release of Debian GNU/Linux is Woody 3.0r0. The system includes KDE and GNOME desktop environments, features cryptographic software, is compatible with the FHS v2.2 and supports software developed for the LSB. This is the first version of Debian that features cryptographic software integrated into the main distribution. OpenSSH and GNU Privacy Guard are included in the default installation, and strong encryption is now present in web browsers and web servers, databases, and so forth. For the first time, Debian comes with the K Desktop Environment 2.2 (KDE). The GNOME desktop environment is upgraded to version 1.4, and X itself is upgraded to the much improved XFree86 4.1. With the addition of several fullfeatured free graphical web browsers in the form of Mozilla, Galeon, and Konqueror, Debian’s desktop offerings have radically improved. This version of Debian supports the 2.2 and 2.4 releases of the Linux kernel. Along with better support for a greater variety of new hardware (such as USB) and significant improvements in usability and stability. Debian GNU/Linux 3.0 features a more streamlined and polished installation. The task system has been revamped and made more flexible. The debconf tool makes configuration of the system easier and more user friendly. This release of Debian is compatible with version 2.2 of the Filesystem Hierarchy Standard (FHS). Debian GNU/Linux now also supports software developed for the Linux Standard Base (LSB), though it is not yet LSB certified. This release of Debian features aptitude, which we covered in last months Linux Magazine, as an alternative for the venerable dselect program, which will make it easier to select packages. About four thousand new software packages were added to the distribution in Debian GNU/Linux 3.0.
KBear A graphical FTP client that allows you to access several servers at the same time. We include the Mandrake RPM, the source RPM along with the uncompiled tar file.
Leocad LeoCAD is a CAD program based around Lego bricks. Currently it has a library of more than 1000 different pieces.
tHTTPd Although small in size the tHTTPd (tiny HTTP daemon) is a perfectly formed miniature web server. This is ideal for when you just need to allow other users browser based access.
PhotoPC A command line tool, which makes automating digital image processing eaiser in a scripting environment.
OpenSSH OpenSSH has become the standard tool for providing encrypted remote access. With the article starting on page 50 you can implement the security features that OpenSSH offers.
MRTG Its speciality is monitoring network traffic and displaying the results graphically. The Multi Router Traffic Grapher also retrieves miscellaneous SNMP variables and can be customized to fulfill the desires of the administrator.
Subscribe & Save Save yourself hours of download time in the future with the Linux Magazine subscription CD! Each subscription copy of the magazine includes a CD like the one described above free of charge. In addition, a subscription will save you over 16% compared to the cover price, and it ensures that you’ll get advanced Linux Know-How delivered to your door every month. Subscribe to Linux Magazine today! Order Online: www.linux-magazine.com/Subs Or use the order form between p66 and p67 in this magazine.
www.linux-magazine.com
October 2002
97