Dr.dobbs.journal.volume.30.Issue.5.Number.372.May.2005 EEn

http://www.ddj.com

#372 MAY 2005

PROGRAMMER

SOFTWARE TOOLS FOR THEPROFESSIONALDr Dobbs,

.Dr.Dobbs,O U R N A LJ

SOFTWARETOOLS FOR THEPROFESSIONALPROGRAMMER

ALGORITHMSALGORITHMSBayesian Text Classification

Numerical Computation & Elliptic Functions

Optimizing Optimal QueensA Multifield Single-Pass Shell Sort

Planarity Edge AdditionProcessing DBMS Rows

Bayesian Text ClassificationNumerical Computation & Elliptic Functions

Optimizing Optimal QueensA Multifield Single-Pass Shell Sort

Planarity Edge AdditionProcessing DBMS Rows

ASP toASP.NET

MigrationStrategy

WindowsForms &Win32

Eclipse &Custom Class

LoadersJerryPournelle onBluetooth

ASP toASP.NET

MigrationStrategy

WindowsForms &Win32

Eclipse &Custom Class

LoadersJerryPournelle onBluetooth

Python 2.4 DecoratorsMultithreaded Technology &

Dual-Core Processors

Python 2.4 DecoratorsMultithreaded Technology &

Dual-Core Processors

Maps as Computer ImagesBattle of the Code Generators

Maps as Computer ImagesBattle of the Code Generators

DR. DOBB’S JOURNAL (ISSN 1044-789X) is published monthly by CMP Media LLC., 600 Harrison Street, San Francisco, CA 94017; 415-947-6000. Periodicals Postage Paid at San Francisco and at additional mailing offices. SUBSCRIPTION: $34.95 for 1 year; $69.90 for 2 years. International orders must be prepaid. Payment may be made via Mastercard, Visa, or American Express; or via U.S.funds drawn on a U.S. bank. Canada and Mexico: $45.00 per year. All other foreign: $70.00 per year. U.K. subscribers contact Jill Sutcliffe at Parkway Gordon 01-49-1875-386. POSTMASTER: Send address changes to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80328-6188. Registered for GST as CMP Media LLC, GST #13288078, Customer #2116057, Agreement #40011901. INTERNATIONALNEWSSTAND DISTRIBUTOR: Worldwide Media Service Inc., 30 Montgomery St., Jersey City, NJ 07302; 212-332-7100. Entire contents © 2005 CMP Media LLC. Dr. Dobb’s Journal is a registered trademark of CMP Media LLC. All rights reserved.

http://www.ddj.com Dr. Dobb’s Journal, May 2005 5

C O N T E N T SMAY 2005 VOLUME 30, ISSUE 5

NEXT MONTH: Testing anddebugging are the focus ofour June issue.

Naïve Bayesian Text Classification 16by John Graham-CummingSpam filtering may be the best known use of naïve Bayesian text classification, but it’s not the only application.

Numerical Computation of Elliptic Functions 22by Michael W. PasheaMichael examines three fundamental elliptic functions, then presents C source code to demonstrate their use.

Optimal Queens 32by Timothy RolfeOptimal Queens is a classic problem in mathematics and computer science. Timothy optimizes it in C and Java.

A Multifield Single-Pass Shell Sort Algorithm 38by MacGregor K. PhillipsThis enhancement to the venerable shell sort algorithm lets you sort on different types of fields.

Planarity by Edge Addition 42by John M. BoyerPlanarity is an important category in graph theory with applications ranging from circuit layout to web-site design.

Processing Rows in Batches 46by Steven F. Lott and Robert LucenteTo avoid sorting all of the rows in the table, focus your sorting on just a subset of those rows.

TileShare: Maps as Computer Images 50by Hrvoje Lukatela and John Russell TileShare is a cross-platform file format and library for efficiently manipulating scanned map images.

Python 2.4 Decorators 54by Phillip EbyDecorators are a powerful Python 2.4 feature that helps you reduce code duplication and consolidate knowledge.

Multithreaded Technology & Multicore Processors 58by Craig SzydlowskiMany software applications are about to be turned upside-down by the transition of CPUs from single- tomulticore implementations.

Battle of the Code Generators 61by Gigi SayfanCode generation involves generating source code in some target programming language from some simpler input.

ASP to ASP.NET Migration Strategy 66by Mark SorokinMigration from ASP to ASP.NET can be done in different ways. Understanding possible paths leads to optimal strategies.

Windows Forms and Win32 74by Richard GrimesTo effectively use Windows Forms, you must have an understanding of how Win32 windowing works.

E M B E D D E D S Y S T E M S

Eclipse & Custom Class Loaders 78by Greg BednarekAll classes used in a Java application are loaded by the System class loader, or a custom, user-defined class loader.

C O L U M N S

Programming Paradigms 82by Michael Swaine

Chaos Manor 85by Jerry Pournelle

F O R U MEDITORIAL 8by Jonathan Erickson

LETTERS 10by you

DR. ECCO'S OMNIHEURIST CORNER 12by Dennis E. Shasha

NEWS & VIEWS 14by Shannon Cochran

OF INTEREST 95by Shannon Cochran

SWAINE’S FLAMES 96by Michael Swaine

R E S O U R C EC E N T E RAs a service to our readers, sourcecode, related files, and authorguidelines are available at http://www.ddj.com/. Letters to theeditor, article proposals andsubmissions, and inquiries canbe sent to [email protected], faxedto 650-513-4618, or mailed to Dr.Dobb’s Journal, 2800 CampusDrive, San Mateo CA 94403.

For subscription questions, call800-456-1215 (U.S. or Canada). Forall other countries, call 902-563-4753or fax 902-563-4807. E-mail sub-scription questions to [email protected] or write to Dr. Dobb’s Journal,P.O. Box 56188, Boulder, CO 80322-6188. If you want to change theinformation you receive from CMPand others about products andservices, go to http://www.cmp.com/feedback/permission.html orcontact Customer Service at theaddress/number noted on this page.

Back issues may be purchasedfor $9.00 per copy (which in-cludes shipping and handling).For issue availability, send e-mailto [email protected], fax to 785-838-7566, or call 800-444-4881(U.S. and Canada) or 785-838-7500 (all other countries). Backissue orders must be prepaid.Please send payment to Dr.Dobb’s Journal, 4601 West 6thStreet, Suite B, Lawrence, KS66049-4189. Individual back articlesmay be purchased electronically athttp://www.ddj.com/.

Embedded Space 87by Ed Nisley

Programmer’s Bookshelf 91by Gregory V. Wilson

http://www.ddj.com/ftp/2005/

P U B L I S H E R E D I T O R - I N - C H I E FMichael Goodman Jonathan Erickson

E D I T O R I A LMANAGING EDITOR Deirdre BlakeMANAGING EDITOR, DIGITAL MEDIA Kevin CarlsonSENIOR PRODUCTION EDITOR Monica E. BergNEWS EDITOR Shannon CochranASSOCIATE EDITORDella WyserART DIRECTOR Margaret A. AndersonSENIOR CONTRIBUTING EDITORAl StevensCONTRIBUTING EDITORS Bruce Schneier, Ray Duncan, Jack Woehr, Jon Bentley,Tim Kientzle, Gregory V. Wilson, Mark Nelson, Ed Nisley,Jerry Pournelle, Dennis E. ShashaEDITOR-AT-LARGE Michael SwainePRODUCTION MANAGEREve Gibson

I N T E R N E T O P E R A T I O N SDIRECTOR Michael CalderonSENIOR WEB DEVELOPER Steve GoyetteWEBMASTERSSean Coady, Joe Lucca

A U D I E N C E D E V E L O P M E N TAUDIENCE DEVELOPMENT DIRECTOR Kevin ReganAUDIENCE DEVELOPMENT MANAGERKarina MedinaAUDIENCE DEVELOPMENT ASSISTANT MANAGERShomari HinesAUDIENCE DEVELOPMENT ASSISTANTMelani Benedetto-ValenteM A R K E T I N G / A D V E R T I S I N GASSOCIATE PUBLISHERWill WiseSENIOR MANAGERS, MEDIA PROGRAMS see page 94Pauline Beall, Michael Beasley, Cassandra Clark, Ron Cordek, Mike Kelleher, Andrew MintzMARKETING DIRECTOR Jessica MartySENIOR ART DIRECTOR OF MARKETING Carey Perez

DR. DOBB’S JOURNAL2800 Campus Drive, San Mateo, CA 94403 650-513-4300. http://www.ddj.com/

CMP MEDIA LLCGary Marshall President and CEOJohn Day Executive Vice President and CFOSteve Weitzner Executive Vice President and COOJeff Patterson Executive Vice President, Corporate Sales &MarketingLeah Landro Executive Vice President, Human ResourcesMike Mikos Chief Information OfficerBill Amstutz Senior Vice President, OperationsSandra Grayson Senior Vice President and General CounselAlexandra Raine Senior Vice President, CommunicationsKate Spellman Senior Vice President, Corporate MarketingMike Azzara Vice President, Group Director of InternetBusinessRobert Faletra President, Channel Group Vicki Masseria President, CMP Healthcare Media Philip Chapnick Vice President, Group Publisher AppliedTechnologiesMichael Friedenberg Vice President, Group PublisherInformationWeek Media NetworkPaul Miller Vice President, Group Publisher ElectronicsFritz Nelson Vice President, Group Publisher NetworkComputing Enterprise Architecture GroupPeter Westerman Vice President, Group Publisher SoftwareDevelopment MediaJoseph Braue Vice President, Director of Custom IntegratedMarketing SolutionsShannon Aronson Corporate Director, Audience DevelopmentMichael Zane Corporate Director, Audience DevelopmentMarie Myers Corporate Director, Publishing Services

PROGRAMMER

SOFTWARE TOOLS FOR THEPROFESSIONALDr.Dobbs,

O U R N A LJ

American Buisness Press

Printed in theUSA

6 Dr. Dobb’s Journal, May 2005 http://www.ddj.com

D ual-core processors, those devices that effectively give you two processors on a single CPU,are the talk of the town. The reason why is that dual-core processors can process twice asmuch data per clock, while handling more threads. In other words, when compared to their

single-core cousins, multicore processors can run at slower speeds and lower voltages, but stilldeliver higher performance. That’s the promise anyway.

To date, both AMD and Intel have announced dual-core offerings. At LinuxWorld, for instance,AMD demonstrated its dual-core AMD Opteron processors running systems from Cray, HP, andSun. For the time being, AMD is laying claim as the only vendor to publicly demonstrate x86dual-core server solutions. For its part, Intel’s plans revolve around its dual-core Pentium ExtremeEdition, which includes Hyper-Threading Technology (HTT) that processes four threadssimultaneously, and its nonHTT Pentium D processor. Dual-core architectures were in thespotlight at the recent Intel Developers Forum. Not to be left out, Analog Devices has extended itsBlackfin processor family with the dual-core ADSP-BF561. Then there’s the ARM PrimeXsys dual-core, the Texas Instruments OMAP5910 dual-core processor, the AtMel dual-core DIOPSIS 740DSP, and so on. You get the picture.

So far, most dual-core activity has been on the server side. IBM, for instance, has been offeringdual-core implementations of its Power4 and Power5 for a couple of years, and both AMD andIntel have targeted the server market to now. Still, they all have designs on the desktop. AMDrecently showed off its “Toledo,” which sports two Athlon 64 processors on the same chip, andwhich specifically targets desktop systems. Likewise Intel’s “Smithfield” targets the desktop, whileits yet to be released “Yonah” is designed for laptops. In short, it appears that Intel, AMD, andother vendors will be moving entire processor lines to multicore architectures over the next fewyears. According to Intel, 70 percent of all its desktop and mobile processors, and 85 percent ofall server processors shipped will be dual core by the end of 2006. Moreover, Intel plans onhaving devices with four cores running up to eight threads each by the end of the decade.

This is exciting stuff for all computer users, but especially software developers. Well, maybe“exciting” isn’t the right word. “Challenging” might be a better way to describe what lays ahead,although some might say “a pain in the keister” is a better fit. Nevertheless, as Herb Sutterpointed out in “A Fundamental Turn Toward Concurrency in Software” (DDJ, March 2005) andCraig Szydlowski in this month’s “Multithreaded Technology & Multicore Processors,” we’re aboutto enter a new world in which terms like “multithreaded,” “concurrency,” and “parallelism” arethe norm, rather than the exception. Dealing with multithreaded applications that run onmulticore machines will likely require new tools, new techniques, and a new way of thinking.

That’s the good news. On the flip side, all kinds of new nontechnical issues come into play.Take software licensing, for instance. According to Craig Szydlowski, Microsoft says it intends onlicensing software on a per processor package basis. In other words, one license for oneprocessor, no matter how many cores are in the CPU. This will surely seed the market and createdemand. Other companies haven’t made this commitment to the future, however. Although ithasn’t made a formal statement, Oracle seems to be leaning towards licensing its software on aper core basis. A dual-core processor would require two licenses, even though it is running on asingle machine. Likewise, BEA hasn’t made a firm commitment, although it reportedly isconsidering a middle-of-the-road approach, whereby licensing might be on a per core basis, butat 1.25 times the cost of a single-core processor, instead of two times the price.

The bottom line is that multicore computing is a question of “when,” not “if.” When will theprocessors be affordable enough for widespread adoption? When will the development tools bethere to build software? And when will the applications be there to take advantage of thepowerful capabilities of multicore machines? As usual in the computer industry, the answer is“real soon now.”

♦ ♦ ♦

On another note, we had multiwinners in our recent Mars Rescue Mission Challenge(http://www.frank-buss.de/marsrescue/). Please join me in congratulating Kevin Shepherd,Randy Sargent, David Finch, Matthew Ogilvie, Jeremie Allard, Stefan Ram, and Allen Noe. Theyeach received a Dr. Dobb’s CD-ROM Release 16 for their efforts. And a special thanks to FrankBuβ for conceiving, coordinating, and judging the challenge.

Two For thePrice of One—Maybe

Jonathan [email protected]

E D I T O R I A L


More Licensing & SuchDear DDJ,In the “Letters” section of the March 2005DDJ, Jim Wiggins uses a Hyundai recallas an example of why software develop-ers should be licensed. Jim cites a seriesof crash tests in which the HyundaiElantra’s air bags failed to deploy. He sug-gests that this error was due to error andprocess problems of the developers of thecar’s software. This is too simplistic. Justbecause the fix is in the software, it doesn’tfollow that the software was written bad-ly. It is just as likely that Hyundai’s hard-ware engineers changed the hardware,without bothering to have the softwaretested and updated as well.

The following scenario seems quitefeasible: A car has been in developmentfor a few years. The software has beendeveloped and tested, with good pro-cesses and rigor and QA, and the teamhas moved on to the next project. Later,when the car is actually being manufac-tured, an alternative for some hardwarecomponent becomes available. The newalternative is cheaper, and is supposedto be 100-percent compatible. Manage-ment elects to substitute that part on theassembly line, and the car becomes a lit-tle more profitable. Only later is it dis-covered that the part isn’t quite the drop-in replacement it was supposed to be,and it broke the software. The softwareteam was never asked or allowed to testthe new configuration, because man-agement assumed it would “just work.”They may not even have been in theloop for the change.

Lug nuts, hubcaps, ashtrays, airbag sen-sors, it’s all the same. Parts are parts, right?Just buy the cheapest one you can find.It would hardly be a surprise if they for-got that some parts can’t be exchanged aseasily as others, and require the softwareto be updated.

No software development licensingscheme could fix this problem, becausethe problem isn’t the developer, it’s themanagement. There may be horror stories

to justify licensing, but this probably isn’tone of them, based on the informationavailable.

Jonathan [email protected]

Smart StuffDear DDJ, The idea of “smart firearms” that JonathanErickson discusses in his March 2005“Smart Stuff” editorial is intellectually in-teresting, but utterly impractical. As thatsaying goes, “In theory, there is no dif-ference between theory and practice. But,in practice, there is.” Years of researchand battlefield testing have gone intomaking modern handguns extremely re-liable. Adding electronics and algorithmsto a handgun, which is fundamentally amechanical device, will decrease its reli-ability. From a hardware perspective, pos-sible failures include the battery goingflat, a connection coming undone due torecoil or poor quality control, or electroniccomponent failure. From a software orfirmware standpoint, it is virtually im-possible to prove that a significantly com-plex piece of software such as this willbe bug free. Look at the 1991 Patriot mis-sile defense system failure, for example(http://www.fas.org/spp/starwars/gao/im92026.htm). If the DoD can’t write bug-free software with its budget and rigor-ous process, who can? Look at theHyundai Elantra airbag deployment fail-ure that reader Jim Wiggins reported inthis very DDJ issue.

From a human factors point of view,physical characteristics can vary hugelyunder stress or circumstances. This is ex-tremely likely to affect someone’s grippattern and trigger pull. For example,you could be shooting using your weakhand because your other hand is inca-pacitated, or your hand could be slip-pery from grease, blood, or sweat. Un-justifiably preventing a person from firinghis or her weapon in a life-or-death sit-uation is likely to lead to wrongful death,arguably an even more tragic event thanaccidental death.

Although it is extremely regrettable that30,000 people a year die from firearm-related deaths, to put this in perspective,the CDC database (http://webapp.cdc.gov/cgi-bin/broker.exe) shows that from1999–2002, for all ages and races, and forboth sexes, the 5th highest cause of deathwas “unintentional injury” (the categoryinto which firearm-related deaths fall).This contributed 404,039 deaths over thefour year period. Of these, 169,467 (41.9percent of unintentional deaths) were dueto “MV traffic.” Only 3164 (0.8 percent ofunintentional deaths) were due to“Firearm.” The total number of people

who died due to the top 10 causes ofdeath in that time period was 7,633,432.This means that relative to the top 10, ac-cidental death due to firearms was only0.04 percent, while “MV Traffic” was 2.22percent. Heart disease (#1) was a whop-ping 37.11 percent. Even if the figure of30,000 a year were used, this would in-crease the percentage to only about 1.6percent of the top 10 causes of death (as-suming a constant rate of firearm deathsover the same four-year period).

Perhaps the research funding could bebetter spent on making vehicles that aresafer to drive— or on firearms safety ed-ucation. A properly secured firearm can-not discharge accidentally.

Edwin [email protected]

The Printed PageDear DDJ, No long ago, I suddenly realized that Ican read DDJ under any position, re-gardless of the light. Upon examining thepaper, I then realized that it does not re-flect the light, compared to past issues.This small change means a lot to me.Thank you. We programmers think aboutalgorithms, program errors, and the likeat different places and times. It is a plea-sure now to take DDJ for a quick lookanytime, anywhere without having to turnthe magazine round and round to avoidlight reflection on the paper.

Stefan [email protected]

Silent Application Update—Sheer Madness!Dear DDJ,I read belatedly, but nonetheless horri-fied by, “Silent Application Update” (DDJ,November 2004). One of my jobs is ITsupport for a motley group of nontech-nical PC users. The last thing I wouldever want an application or, worse still,an operating system, to do, is updatesilently. This is the sacrifice of the lambof software security and maintenance onthe altar of automation. I have seennonautomatic updates cripple and freezemany Windows systems. I shudder tothink how could I ever trace back to asilent update of an application somestrange new behavior of a PC I might beasked to fix. Maybe silent updates wouldwork on a uniform set of the latest PCs,used by IT professionals under strictguidance of a corporate IT police. In thereal world, this is sheer madness.

Andrew [email protected]

DDJ

L E T T E R S

,

D

C E N T S

22

22

OBB S POS T


D R . E C C O ’ S O M N I H E U R I S T C O R N E R

Michael Sturm handed Ecco hisbusiness card. The listed profes-sion: Geometer-Farmer. “I’m a rather unusual farmer,” he

said after noting Ecco’s smile. “My pas-sions in fact are geometry and mechan-ics. I have designed sprinklers that canmove around a radius of up to 1.5 kilo-meters for example. The farmer part is fa-milial. My brother and I have just boughta rectangular property that is 1 kilometernorth-south by 2 kilometers east-west.

“We want to water all of our land with-out watering too much area twice andwithout watering outside our rectangles.So, we measure cost (or overhead, if youwish) as the area outside the square thatreceives water and the area within thesquare having more than one sprinklercircle covering it. We want to minimizecost while ensuring that every bit of ourfarm is watered.”

Liane interrupted briefly: “So if some areagets hit by three sprinklers, then you countthat the same as if it were hit by just two?”

“Yes, good question,” said Sturm shak-ing Liane’s hand. “Not everyone picks upthat subtle point. Now here are my ques-tions. For k=5:

1. What is the radius of k circles that willcover the entire rectangle while mini-mizing cost?

2. Answer the same question if all thesprinkler radiuses must be the same.”

Sturm went on, but Tyler and Lianecould not solve the next questions. Sothese are still open:

3. How do your answers change as k in-creases, say, to 10, 20, and 100?

4. For a given k, what is the rectanglewhose aspect ratio would be best andthat would allow one to cover the rect-angle at minimum cost?

Reader SolutionsTo “Dig That!”Michael Birken and Rick Kaye came upwith several very clever solutions to the“Dig That!” puzzle (DDJ, February 2005).

The problem was to find the route of anunderground tunnel using probes at the in-tersection of a road grid. Each probe coulddetermine the entering and leaving direc-tions of the tunnel if the tunnel were pre-sent. Mike’s five-probe solution for a tun-nel of length 8 is available at http://cs.nyu.edu/cs/faculty/shasha/papers/digthat8.PNG.For tunnels of lengths 10 and 12, he cameup with 8 and 15 probe solutions, thoughhe is not sure of optimality.

For the solution to last month’s puzzle, see page 83.

DDJ

Optimal Farming

Dennis E. Shasha

Dennis is a professor of computer scienceat New York University. His most recent booksare Dr. Ecco’s Cyberpuzzles (2002) andPuzzling Adventures (2005), both pub-lished by W.W. Norton. He can be contact-ed at [email protected].


Figure 1.

2

1


SHA-1 Cracked—In TheoryThe Secure Hash Algorithm-1 (SHA-1) hasbeen one of the world’s most popularhash algorithms ever since it was devel-oped by the U.S. National Security Agen-cy in 1995. Recently, however, XiaoyunWang, Yiqun Lisa Yin, and Hongbo Yu,who are researchers at China’s ShandongUniversity, claim to have devised a tech-nique by which the algorithm can theo-retically be compromised 2000 times morequickly than with a brute-force approach.For details, see “Collision Search Attackson SHA1” (http://theory.csail.mit.edu/~yiqun/shanote.pdf). SHA-1 is used to cre-ate digital signatures by protocols such asthe Secure Sockets Layer (SSL).

ICFP 2005 ProgrammingContest AnnouncedThe ground rules have been laid for thisyear’s International Conference on Func-tional Programming contest (http://icfpc.plt-scheme.org/). The first problem willbe posted June 24th; initial entries are dueby June 27. The revised problem will beannounced on July 9. The year marks theeighth annual ICFP programming contest.Last year, programmers were challengedto “design an ant colony that will bringthe most food particles back to its anthill,while fending off ants of another species.To win the contest, you must submit theneural wiring for the ants in your colony—“a text file containing code for a simple,finite state machine that is run by all ofyour ants.” The programs were then pit-ted against each other in a tournament todetermine the winner. Prizes this year havenot been announced, but have tradition-ally included cash as well as “the satis-faction of hearing the judges proclaim yourprogramming language ‘the programmingtool of choice for discriminating hackers.’”Haskell took the honors in 2004.

Cerf and Kahn Win Turing AwardVint Cerf and Robert Kahn have beennamed winners of the 2004 A.M. TuringAward, “for pioneering work on the de-sign and implementation of the Internet’sbasic communications protocols.” The Tur-ing award carries a $100,000 prize spon-sored by Intel. It was established in 1966,when A.J. Perlis became the first honoree,and has been awarded every year since.Cerf and Kahn worked together at DARPAbeginning in 1973. In the course of a pro-

ject to integrate three independent net-works, they developed the concepts ofrouters, IP addresses, and the TCP proto-col, which they published in 1974. Fouryears later, Cerf and several colleaguessplit the original TCP protocol into twoparts, thus inventing TCP/IP. This year’sTuring Award will be formally bestowedat the annual ACM Awards Banquet in SanFrancisco.

IBM Contributes to Open SourceIBM has contributed more than 30 open-source projects to SourceForge.net (http://www.ostg.com/) including the Jikes Javacompiler. Other open-source contributionsinclude projects revolving around ApacheDerby, Eclipse, Globus, Linux, and PHP. Inaddition to source code, the contributionsinclude articles, tutorials, forums, blogs, plug-ins, and the IBM Linux Software EvaluationKit. All in all, IBM has contributed more than120 collaborative projects to the open-sourcecommunity. IBM also has teamed up withZend Technologies to develop integratedsoftware based on PHP using IBM’s Cloud-scape database.

…And So Does AdobeNot to be outdone, Adobe has launched anopen-source web site of its own at http://opensource.adobe.com/. Adobe’s open-source web site is the home for the AdobeSource Libraries (ASL) and informationabout other Adobe open-source projects.ASL provides portable, peer-reviewed C++source libraries for leveraging and ex-tending both the C++ Standard Library andthe Boost Libraries. The first two librariesavailable, called Adam and Eve, are com-ponents for modeling the human interfaceappearance and behavior in a software ap-plication. They are written in C++ and havebeen released under the MIT License, anOSI-Approved Open Source License.

Machine-Learning AlgorithmsApplied to HIV ResearchMicrosoft researchers David Heckermanand Nebojsa Jojic have collaborated withHIV researchers to apply machine-learningand data-mining algorithms used in com-puter science to develop new ap-proaches to creating HIV vaccine mod-els. In particular, the algorithms letMicrosoft database software identify pat-terns within large computer databases.Software based on these algorithms

combed through the hundreds of genet-ic sequences, and tested millions of dif-ferent possible combinations of epitopesand immune types. Interestingly, Microsofthas used similar algorithms to help dif-ferentiate spam from legitimate e-mail.

According to Simon Mallal, professor atthe Centre for Clinical Immunology andBiomedical Statistics at Royal Perth Hos-pital and Murdoch University in Australia,the key to fighting HIV is to find patternsin how it mutates to create versions of thevirus that can escape recognition by thecarrier’s immune system. By uncoveringpatterns in different patients, the re-searchers believe they can more accuratelypredict the HIV epitopes needed to trainthe immune system to recognize and fightthe virus. He added that, regardless of howsuccessful these approaches are, they mayhelp with the design of vaccines for oth-er mutating viruses. For more information,see http://www.microsoft.com/presspass/features/2005/feb05/02-23HIVResearch.asp.

Yahoo Intros Search APIs,Developer NetworkYahoo has launched its Yahoo Search De-veloper Network (http://developer.yahoo.net/), an online resource offering devel-opers access to web-service APIs for Ya-hoo Search products, along with existingAPIs. Yahoo sees YSDN as a place wheredevelopers can share code, ideas, and ap-plications that extend the company’ssearch technology. According to Yahoo,each API provides developers with accessto 5000 queries per day per API, five timesmore than the limits placed on users ofthe Google Web API. (Google offers anAPI as well.) The site provides SDKs, doc-umentation, FAQs, and the like for devel-oping apps that use Yahoo’s search webservices.

In Defense of Open-SourceProgrammersColumbia University law professor andopen-source advocate Eben Moglen hasformed the Software Freedom Law Cen-ter, to provide legal representation andother law-related services to advancefree and open-source software (http://www.softwarefreedom.org/). Among thepro bono services offered by the SFLCare license development and implemen-tation consulting; legal consulting andlawyer training; and legal defense againstlitigation.

News & ViewsDr. Dobb’s

News & ViewsSECTION

AMAIN NEWS

DR. DOBB’S JOURNALMay 1, 2005

Paul Graham popularized the term “Bayesian Classification”(or more accurately “Naïve Bayesian Classification”) afterhis “A Plan for Spam” article was published (http://www.paulgraham.com/spam.html). In fact, text classifiers based

on naïve Bayesian and other techniques have been around formany years. Companies such as Autonomy and Interwoven in-corporate machine-learning techniques to automatically clas-sify documents of all kinds; one such machine-learning tech-nique is naïve Bayesian text classification.

Naïve Bayesian text classifiers are fast, accurate, simple, andeasy to implement. In this article, I present a complete naïveBayesian text classifier written in 100 lines of commented, nonob-fuscated Perl.

A text classifier is an automated means of determining somemetadata about a document. Text classifiers are used for suchdiverse needs as spam filtering, suggesting categories for in-dexing a document created in a content management system,or automatically sorting help desk requests.

The classifier I present here determines which of a set of pos-sible categories a document is most likely to fall into and canbe used in any of the ways mentioned with appropriate train-ing. Feed it samples of spam and nonspam e-mail and it learnsthe difference; feed it documents on various medical fields andit distinguishes an article on, say, “heart disease” from one on“influenza.” Show it samples of different types of help desk re-quests and it should be able to sort them so that when 50 e-mails come in informing you that the laser printer is down, you’llquickly know that they are all the same.

The MathYou don’t need to know any of the underlying mathematics touse the sample classifier presented here, but it helps.

The underlying theorem for naïve Bayesian text classification isthe Bayes Rule:

P(A|B) = ( P(B|A) * P(A) ) / P(B)

The probability of A happening given B is determined fromthe probability of B given A, the probability of A occurring andthe probability of B. The Bayes Rule enables the calculation ofthe likelihood of event A given that B has happened. This isused in text classification to determine the probability that a doc-ument B is of type A just by looking at the frequencies of wordsin the document. You can think of the Bayes Rule as showing

how to update the probability of event A happening given thatyou’ve observed B.

A far more extensive discussion of the Bayes Rule and its gen-eral implications can be found in the Wikipedia (http://en.wikipedia.org/wiki/Bayes%27_Theorem). For the purposes oftext classification, the Bayes Rule is used to determine the cat-egory a document falls into by determining the most probablecategory. That is, given this document with these words in it,which category does it fall into?

A category is represented by a collection of words and theirfrequencies; the frequency is the number of times that each wordhas been seen in the documents used to train the classifier.

Suppose there are n categories C0 to Cn–1. Determining whichcategory a document D is most associated with means calculat-ing the probability that document D is in category Ci, writtenP(Ci|D), for each category Ci.

Using the Bayes Rule, you can calculate P(Ci|D) by computing:P(Ci|D) = ( P(D|Ci ) * P(Ci) ) / P(D)

P(Ci|D) is the probability that document D is in category Ci;that is, the probability that given the set of words in D, they ap-pear in category Ci. P(D|Ci) is the probability that for a givencategory Ci, the words in D appear in that category.

P(Ci) is the probability of a given category; that is, the prob-ability of a document being in category Ci without consideringits contents. P(D) is the probability of that specific documentoccurring.

To calculate which category D should go in, you need to cal-culate P(Ci|D) for each of the categories and find the largestprobability. Because each of those calculations involves the un-known but fixed value P(D), you just ignore it and calculate:

P(Ci |D) = P(D|Ci ) * P(Ci)

P(D) can also be safely ignored because you are interestedin the relative— not absolute—values of P(Ci|D), and P(D)simply acts as a scaling factor on P(Ci|D).


Naïve Bayesian Text Classification

John is chief scientist at Electric Cloud, which focuses on re-ducing software build times. He is also the creator of POPFile.John can be contacted at [email protected].

JOHN GRAHAM-CUMMING

Fast, accurate, and easy to implement

“A text classifier is an automatedmeans of determining somemetadata about a document”

D is split into the set of words in the document, called W0through Wm–1. To calculate P(D|Ci ), calculate the product ofthe probabilities for each word; that is, the likelihood that eachword appears in Ci. Here’s the “naïve” step: Assume that wordsappear independently from other words (which is clearly nottrue for most languages) and P(D|Ci ) is the simple product ofthe probabilities for each word:

P(D|Ci) = P(W0|Ci) * P(W1|Ci) *… * P(Wm–1|Ci)

For any category, P(Wj|Ci) is calculated as the number oftimes Wj appears in Ci divided by the total number of words inCi . P(Ci) is calculated as the total number of words in Ci di-vided by the total number of words in all the categories put to-gether. Hence, P(Ci|D) is:

P(W0|Ci) * P(W1|Ci) * ... * P(W m –1|Ci) * P(Ci)

for each category, and picking the largest determines the cate-gory for document D.

A common criticism of naïve Bayesian text classifiers is thatthey make the naïve assumption that words are independent ofeach other and are, therefore, less accurate than a more com-plex model. There are many more complex text classificationtechniques, such as Support Vector Machines, k-nearest neigh-bor, and so on. In practice, naïve Bayesian classifiers often per-form well, and the current state of spam filtering indicates thatthey work very well for e-mail classification.

A useful toolkit that implements different algorithms is thefreely available Bow toolkit from CMU (http://www-2.cs.cmu.edu/~mccallum/bow/). It makes a useful testbed for compar-ing the accuracy of different techniques. A good starting pointfor reading more about naïve Bayesian text classification is the

Wikipedia article on the subject (http://en.wikipedia.org/wiki/Naïve_Bayesian_classification).

ImplementationThe Perl implementation (Listing One) uses the hash (associa-tive array) %words to store the word counts for each word andfor each category. The hash is stored to disk using a Perl con-struct called a “tie” that, when used with the DB_File module,results in the hash being stored automatically in a file called“words.db” so that its contents persist between invocations.

use DB_File;my %words;tie %words, 'DB_File', 'words.db';

The hash keys are strings of the form category-word: For ex-ample, if the word “potato” appears in the category “veggies” witha count of three, there will be a hash entry with key “potato-veggies” and value “3.” This data structure contains enough in-formation to compute the probability of a document and do anaïve Bayesian classification.

The subroutine parse_ file reads the document to be classifiedor trained on and fills in a hash called %words_in_ file that mapswords to the count of the number of times that word appearedin the document. It uses a simple regular expression to extractevery 3- to 44-letter word that is followed by whitespace; in areal classifier, this word splitting could be made more complexby accounting for punctuation, digits, and hyphenated words.

sub parse_file{

my ( $file ) = @_;my %word_counts;open FILE, "<$file";while ( my $line = <FILE> ) {

while ( $line =~s/([[:alpha:]]{3,44})[ \t\n\r]// ){

$word_counts{lc($1)}++;}

}close FILE;return %word_counts;

}

The output of parse_ file can be used in two ways: It can beused to train the classifier by learning the word counts for a par-ticular category and updating the %words hash, or it can be usedto determine the classification of a particular document.

To train the classifier, call the add_words subroutine with theoutput of parse_ file and a category. In the Perl code, a catego-ry is any string and the classifier is trained by passing sampledocuments into parse_ file and then into add_words: add_words( <category>, parse_ file( <sample document>));

sub add_words{

my ( $category, %words_in_file ) = @_;foreach my $word (keys %words_in_file) {

$words{"$category-$word"} +=$words_in_file{$word};

}}

Once document training has been done, the classify subrou-tine can be called with the output of parse_ file on a document.classify will print out the possible categories for the documentin order of most likely to least likely:

classify ( parse_file( <document to classify> ) );

sub classify{

(continued from page 16)


my ( %words_in_file ) = @_;my %count;my $total = 0;foreach my $entry (keys %words) {

$entry =~ /^(.+)-(.+)$/;$count{$1} += $words{$entry};$total += $words{$entry};

}my %score;foreach my $word (keys %words_in_file) {

foreach my $category (keys %count) {if (defined($words{"$category-$word"})) {

$score{$category} +=log( $words{"$category-$word"} /

$count{$category} );} else {

$score{$category} +=log( 0.1 /

$count{$category} );}

}}foreach my $category (keys %count) {

$score{$category} +=log( $count{$category} / $total );

}foreach my $category (sort { $score{$b} <=> $score

{$a} } keys %count) {print "$category $score{$category}\n";

}}

classify first calculates the total word count ($total ) for all cat-egories (which it needs to calculate P(Ci)) and the word count foreach category (%count indexed by category name, which it needsto calculate P(Wj|Ci)). Then classify calculates the score for eachcategory: The score is the value of P(Ci|D). It’s preferable to callit a score for two reasons: Ignoring P(D) means that, strictly speak-ing, the value is being calculated incorrectly and classify uses logsto reduce overflow errors and replace multiplication by additionfor speed. The score is in fact log P(Ci|D), which is:

log P(W0|Ci) + log P(W1|Ci) + ... + log P(Wm –1|Ci) + log P(Ci)

(Recall the equality log (A*B)=log A+log B). In that log form, itis still suitable for comparison. After the score has been calculat-ed, classify calculates log P(Ci) for each category and then sortsthe scores in descending order to output the classifier’s opinionof the document. classify makes an estimate of the probability fora word that doesn’t appear in a particular category by calculatinga very small, nonzero probability for that word based on the wordcount for the category:

$score{$category} += log( 0.1 / $count{$category} );

A small amount of Perl code wraps these three subroutinesinto a usable classifier that accepts commands to add a docu-ment to the word list for a category (and hence, train the clas-sifier), and to classify a document.

if ( ( $ARGV[0] eq 'add' ) && ( $#ARGV == 2 ) ) {add_words( $ARGV[1],

parse_file( $ARGV[2] ) );} elsif ( ( $ARGV[0] eq 'classify' ) && ( $#ARGV == 1 )) {

classify( parse_file( $ARGV[1] ) );} else {

print <<EOUSAGE;Usage: add <category> <file> - Adds words from <file>to category <category>

classify <file> - Outputs classificationof <file>EOUSAGE}untie %words;

If the Perl code is stored in file bayes.pl, then the classifier istrained like this:

perl bayes.pl add veggies article-about-vegetablesperl bayes.pl add fruits article-about-fruitsperl bayes.pl add nuts article-about-nuts

to create three categories (veggies, fruits, and nuts). Askingbayes.pl to classify a document will output the likelihood thatthe document is about vegetables, fruits, or nuts:

% perl bayes.pl classify article-I-just-wrotefruits -4.11700258611469nuts -6.60190923590268veggies -11.9002266024507

Here, bayes.pl shows that the new article is most likely about fruits.

E-Mail ClassificationIf you are interested in classifying e-mail, there are a couple oftweaks that improve accuracy in practice: Don’t fold case on val-ues from headers and count words differently if they appear inthe subject or body.

In the aforementioned Perl implementation, there is no dif-ference between the words From, FROM, and fRoM: They areall considered to be instances of from. The parse_ file subrou-tine lowercases the word before counting it. In practical e-mailclassifiers, the names of e-mail headers turn out to be a betterindicator of the type of an e-mail if case is preserved. For ex-ample, the header MIME-Version was written MiME-Version byone piece of common spamming software.

Distinguishing words found in the subject versus the bodyalso increases the accuracy of a naïve Bayesian text classifieron e-mail. The simplest way to do this is to store a word likeforward as subject:forward when it comes from the subjectline, and simply forward when it is seen in the body.

PerformanceThe Perl code presented here isn’t optimized at all. Each timeclassify is called, it has to recalculate the total word count foreach category and it would be easy to cache the log values be-tween invocations. The use of a Perl hash will not scale well interms of memory usage.

However, the algorithm is simple and can be implemented inany language. A highly optimized version of this code is used inthe POPFile e-mail classifier to do automatic classification. It usesa combination of Perl and SQL queries. The Bow toolkit fromCMU has a fast C implementation of naïve Bayesian classification.

Uses of Text ClassificationAlthough spam filtering is the best-known use of naïve Bayesiantext classification, there are a number of other interesting useson the horizon. IBM researcher Martin Overton has publisheda paper concerning the use of naïve Bayesian e-mail classifica-tion to detect e-mail-borne malware (http://arachnid.home-ip.net/papers/VB2004-Canning-more-than-SPAM-1.02.pdf). In Over-ton’s paper, presented at the Virus Bulletin 2004 conference, hedemonstrated that a text classifier could accurately identify wormsand viruses, such as W32.Bagle, and that it was able to spoteven mutated versions of the worms. All this was done withoutgiving the classifier any special knowledge of viruses.

The POPFile Project is a general e-mail classifier that can clas-sify incoming e-mail into any number of categories. Users ofPOPFile have reported using its naïve Bayesian engine to clas-sify mail into up to 50 different categories with good accuracy,and one journalist uses it to sort “interesting” from “uninterest-ing” press releases.

At LISA 2004, four Norwegian researchers presented a paperconcerning a system called DIGIMIMIR, which was capable of


automatically classifying requests coming into a typical IT helpdesk and in some cases responding automatically (http://www.digimimir.org/). They use a document clustering approach that,while not naïve Bayesian, is similar in implementation complex-

ity and allowed the clustering together of “similar” e-mails with-out knowing the initial set of possible topics.

DDJ

Listing Oneuse strict;use DB_File;

# Hash with two levels of keys: $words{category}{word} gives count of# 'word' in 'category'. Tied to a DB_File to keep it persistent.

my %words;tie %words, 'DB_File', 'words.db';

# Read a file and return a hash of the word counts in that file

sub parse_file{

my ( $file ) = @_;my %word_counts;

# Grab all the words with between 3 and 44 letters

open FILE, "<$file";while ( my $line = <FILE> ) {

while ( $line =~ s/([[:alpha:]]{3,44})[ \t\n\r]// ) {$word_counts{lc($1)}++;

}}close FILE;return %word_counts;

}

# Add words from a hash to the word counts for a categorysub add_words{

my ( $category, %words_in_file ) = @_;

foreach my $word (keys %words_in_file) {$words{"$category-$word"} += $words_in_file{$word};

}}

# Get the classification of a file from word countssub classify{

my ( %words_in_file ) = @_;

# Calculate the total number of words in each category and# the total number of words overall

my %count;my $total = 0;foreach my $entry (keys %words) {

$entry =~ /^(.+)-(.+)$/;

$count{$1} += $words{$entry};$total += $words{$entry};

}

# Run through words and calculate the probability for each category

my %score;foreach my $word (keys %words_in_file) {

foreach my $category (keys %count) {if ( defined( $words{"$category-$word"} ) ) {

$score{$category} += log( $words{"$category-$word"} /$count{$category} );

} else {$score{$category} += log( 0.01 /

$count{$category} );}

}}# Add in the probability that the text is of a specific category

foreach my $category (keys %count) {$score{$category} += log( $count{$category} / $total );

}foreach my $category (sort { $score{$b} <=> $score{$a} } keys %count) {

print "$category $score{$category}\n";}

}

# Supported commands are 'add' to add words to a category and# 'classify' to get the classification of a file

if ( ( $ARGV[0] eq 'add' ) && ( $#ARGV == 2 ) ) {add_words( $ARGV[1], parse_file( $ARGV[2] ) );

} elsif ( ( $ARGV[0] eq 'classify' ) && ( $#ARGV == 1 ) ) {classify( parse_file( $ARGV[1] ) );

} else {print <<EOUSAGE;

Usage: add <category> <file> - Adds words from <file> to category <category>classify <file> - Outputs classification of <file>

EOUSAGE}

untie %words;

DDJ


It doesn’t happen often, but every oncein a while, you may encounter an engi-neering problem that requires the useof elliptic functions for a solution. How-

ever, if you mention elliptic functions to agroup of engineers, you probably will geta variety of reactions. Most, if not all, willbe familiar with the circular and hyperbol-ic functions and may sneak a quick peekat their calculators to see if they may havemissed something. The familiar keys forsine, cosine, and tangent are easily found,along with the key for calculating the cor-responding hyperbolic functions close by.But keys for calculating the elliptic func-tions are normally not to be found.

Historically, elliptic functions originat-ed during the 1700s in an effort to findmathematical relationships for the ellipsesimilar to the trigonometric relationshipsobtained through the study of the circleand hyperbola. The study of elliptic func-tions was also of practical use in solvingproblems of that time period, such as find-ing the oscillation period of a simple pen-dulum. There are indeed many similari-ties between the circular, hyperbolic, andelliptic functions that result from the factthat the elliptic functions are a generalcase, and the circular and hyperbolic func-tions are special cases of elliptic functions.

The first use of elliptical functions inelectrical engineering applications occurredmuch later when Wilhelm Cauer applied

them during the 1930s in solving a designproblem for the German telephone in-dustry. The passive filter developed byCauer met the same specifications as ex-isting filters while requiring one less in-ductor. Legend has it that shortly afterAmerican telephone engineers learned ofCauer’s new design method from review-ing his patent application, every volumeon elliptic functions was checked out fromthe Bell Labs library.

In this article, I examine elliptic func-tions, starting with definitions of the threefundamental elliptic functions based on asingle problem— finding an arc length onthe unit ellipse. Based on this problem, Iderive the identities relating the three fun-damental elliptic functions, their deriva-tives, and their periodicity. This includesnumerical examples that demonstrate thecomputation of the elliptic functions aswell as elliptic integrals of the first andsecond kinds. Finally, I present C sourcecode for functions based on the numeri-cal examples to demonstrate applicationof the algorithms.

Rectification of the Unit EllipseFigure 1 shows a standard unit ellipse withfocal points at k and –k with a major axisof length 1. You begin with the equationof this ellipse:

y2x2+ —————=1

1–k2

To find the arc length from Q to P in Fig-ure 1 (also known mathematically as “rec-tifying the ellipse”), you calculate ds as:

ds=√(dx2+dy2)

Implicit differentiation of the unit ellipseequation yields:

–(1–k2)xdxdy= —————

y

–(1–k2)xdx= ———————√((1–k2) (1–x2))

Therefore, the arc length, s, from Q toP of the ellipse, is calculated as follows:

√(1–k2x2)s=∫ds=

0∫x

————– dx√(1–x2)

The √(1–x2) term in the equation forcalculating arc length leads to a naturalchange of variables. Let:

x=sinφdx=cosφdφ

substituting:

s=∫ds=0∫φ√(1–k2sin2φ)dφ

In either of the above forms, the inte-gral equation for calculating the arc lengths cannot be expressed in terms of ele-mentary functions. However, based onthese results, it is possible to define thearc length of 1/4 of the ellipse as:

E(k)==0∫π/2

√(1–k2sin2φ)dφ

Using this definition, the perimeter of theunit ellipse is 4E(k), where E(k) is by def-inition an elliptic integral of the second kind,which may only be evaluated numerically.

Definitions of the Elliptic Functions Based on the previous discussion, youmay now define the elliptic functions thatare used to parameterize the unit ellipsein the same manner that the circular func-tions parameterize the unit circle. Definesn(u,k) the elliptic sine, and cn(u,k) theelliptic cosine as follows:

x=sn(u,k)=sinφy=√(1–k2)cn(u,k)=√(1–k2)cosφAn immediate result of these definitions

is that the elliptic sine and cosine obey anidentity similar to circular functions:

Numerical ComputationOf Elliptic Functions Picking up where yourscientific calculatorleaves off

MICHAEL W. PASHEA

Michael is a control systems analysis engi-neer with the Boeing Company and a part-time lecturer in the Department of Electri-cal and Computer Engineering at SouthernIllinois University at Edwardsville. He canbe contacted at [email protected].


“In early calculatordesigns, registerswere a premium”

sn2(u,k)+cn2(u,k)=1

The derivatives of sn(u,k) and cn(u,k)are found by differentiating the paramet-ric equations:

d d dφ— sn(u,k)= —sinφ= cosφ —du du du

dφcn(u,k) —

du

And:

d d dφ— cn(u,k)= —cosφ= –sinφ —du du du

dφ= –sn(u,k) —

du

If you define the elliptic functiondn(u,k) as:

dφdn(u,k)= —

du

you have:

d— sn(u,k)= sn'(u,k)=cn(u,k)dn(u,k)du

d— cn(u,k)= cn'(u,k)= –sn(u,k)dn(u,k)du

This definition of dn(u,k) also allows φ tobe expressed in terms of u. When ex-pressed in this manner, φ is sometimescalled the “amplitude of u” or am(u,k).

φ=am(u,k)=0∫u

dn(u,k)du

None of the aforementioned definitionsplace any restrictions on the parameter uin the parametric equations. The param-eter u is defined such that:

d d— = —dφ ds

In other words, the rate at which uchanges with respect to the angle φ is thesame as the rate at which the anglechanges with respect to the arc length s.The rate of change with respect to the arclength s may be seen from Figure 1 if youenvision the point P moving away frompoint Q while remaining on the unit el-lipse. The definition of u is not as easilyseen, although it has physical significancein a number of problems, an example be-ing the calculation of the period of a pen-dulum. It is also interesting to note thatfor circular functions, dφ/ds=1. You canconclude that for values of k close to 0,the ellipse is nearly circular and u will ap-proximate the arc length of the ellipse.Otherwise, u represents the cumulativerate at which the angle φchanges with re-spect to arc length as a function of the an-gle φ. With this definition for u, you find:

1u=F(φ,k)=

0∫φ

—————— dφ√(1–k2sin2φ)

1=0∫1

————–——dx√(1–x2)√(1–k2x2)

Also,

dφdn(u,k)=— =√(1–k2sin2φ)

du

=√(1–k2sn2u)

The derivative of dn(u,k) may then becalculated as:

d— dn(u,k)= dn'(u,k)= –k2sn(u,k)cn(u,k)du

Periodicity of the Elliptic FunctionsJust as with the definition of E(k) givenpreviously, you can define the value of ufor 1/4 of the ellipse as follows:

1K(k)=F(

2—π

,k)=0∫

π/2 —————— dφ√(1–k2sin2φ)

In general, F(φ,k) is known as an “in-complete elliptic integral of the first kind,”while K(k) is known as the “complete el-liptic integral of the first kind.” The onlydifference between these two integrals liesin the upper limit of integration, whichmay be any angle for computing F(φ,k)but must be π/2 when calculating K(k).The reason for this definition of K(k) may



Figure 2: Three elliptic functions with k=0.1.

Figure 1: Unit ellipse definitions.

be seen from Figure 1 with the under-standing that K(k)=K is the value of u thatcorresponds to starting from point Q andmoving clockwise along 1/4 of the unitellipse. The following values of sn(u,k),cn(u,k), and dn(u,k) result:

sn(0,k) = 0 cn(0,k) = 1 dn(0,k) = 1sn(K,k) = 1 cn(K,k) = 0 dn(K,k) = √1–k2

sn(2K,k) = 0 cn(2K,k) = –1 dn(2K,k) = 1sn(3K,k) = -1 cn(3K,k) = 0 dn(3K,k) = √1–k2

sn(4K,k) = 0 cn(4K,k) = 1 dn(4K,k) = 1

Therefore, for real values of u with0<k<1, the elliptic functions sn(u,k) and

cn(u,k) are periodic with period 4K. Thefunction dn(u,k) has a period of 2K. Ja-cobi proved that if u is imaginary, the pe-riod of sn(u,k) becomes j2K', the periodof cn(u,k) is 2K+j2K', and the period ofdn(u,k) is j4K', where K'–K(√1– k2). As aresult, the elliptic functions are often re-ferred to as “doubly periodic” functions.

Figures 2, 3, and 4 are plots of sn(u,k),cn(u,k), and dn(u,k) for selected valuesof k=0.1, k=0.5, and k=0.9, respectively.

Summarizing Elliptic Function Properties There are three fundamental elliptic func-tions that are defined based on the unitellipse: sn(u,k), cn(u,k), and dn(u,k).These functions are doubly periodic with

sn(u,k) having a real period of 4K and animaginary period of j2K', cn(u,k) havinga real period of 4K and an imaginary pe-riod of 2K+j2K', and dn(u,k) having a realperiod of 2K and an imaginary period ofj4K'. These functions obey the identities:

sn2(u,k)+cn2(u,k)=1dn2(u,k)+k2sn2(u,k)=1dn2(u,k)–k2cn2(u,k)=1–k2

The derivatives of the elliptic func-tions are:

d— sn(u,k)= cn(u,k)dn(u,k)du

d— cn(u,k)= –sn(u,k)dn(u,k)du

d— dn(u,k)= –k2sn(u,k)cn(u,k)du

Where u is defined by the incomplete el-liptic integral of the first kind:

1u=F(φ,k)=

0∫φ

—————— dφ√(1–k2sin2φ)

1=

0∫x

———————— dx√(1–x2)√(1–k2x2)

=sn–1(x,k)

The elliptic integral of the second kindhas the form:

s=E(φ,k)=0∫φ √(1–k2sin2φdφ)

√(1–k2x2)=

0∫x

—————— dx√1–x2)

=0∫u

dn2(u,k)du

As seen previously, elliptic integrals ofthe second kind are encountered whenfinding the perimeter of an ellipse.

Finally, the circular and hyperbolic func-tions are special cases of the elliptic func-tions. In particular:

sn(u,0) = sin(u)cn(u,0) = cos(u) dn(u,0) = 1

sn(u,1) = tanh(u) cn(u,1) = sech(u) dn(u,1) = sech(u)

Numerical ComputationOn the surface, computation of the ellipti-cal functions appears to be straightforward



Figure 3: Three elliptic functions with k=0.5.

Figure 4: Three elliptic functions with k=0.9

because the elliptical sine of u, sn(u,k) isdefined as:

sn(u,k)=sinφ

Given u, there should be a corre-sponding value φ that allows the ellipticsine to be computed. However, finding φfor a given value of u is complicated bythe fact that the defining relationship be-tween u and φ cannot be realized by el-ementary functions. This relationship isdefined as:

1u=

0∫φ

—————√(1–k2sin2φ)

Although this integral cannot be ex-pressed in terms of elementary functions,I’ve shown that under certain circum-stances, this integral will reduce to an in-tegral that may be expressed in terms ofelementary functions. The two cases of in-terest are:

1) if k → 0 then u=∫dφ=φ2) if k → 1 then u=∫secφdφ=In(secφ+tanφ)

You would like to find a series of trans-forms such that:

1 1u=

0∫φ

—————— dφ=c10∫φ1—————dφ1√(1–k2sin2φ) √(1–k1

2sin2φ1)

1=c1c2c3…cN

φN

∫0——————— dφN√(1–kN

2sin2φN)

At some point, kN becomes closeenough to either one or zero that you ter-

minate the expansion by approximatingthe integral as either φN or ln(secφN+tanφN).This process lets you calculate φN in termsof u, but what you really need is φ interms of u, so that you can compute:

sn(u,k)=sinφcn(u,k)=cosφdn(u,k)=√(1–k2sinφ)

This means that once φN is calculated,you must reverse the process to find thevalue of φ. The numerical examples I pre-sent here are based on a method sug-gested by Jacobi. The transforms are giv-en without proof. I encourage you toprove that the given substitutions trans-form the integral as stated.

Example 1For the first example, assume you wantto calculate sn(1.235,0.5)

Let:

1–√(1–k2n –1)kn= ——————

1+√(1–k2n –1)

and:

knsinφn=sin(2φn –1–φn)

then:

1 u=∫————————— dφn –1=

√(1–kn –12sin2φn –1)

1(1+kn) 1 ——— ∫ ——————— dφn

2 √(1–k2nsin2φn)

With these transforms, you can derivethat as n->N, then kN->0 and:

(1+k1)(1+k2)…(1+kN)u= ————————— φN2N

To find N, you generate Table 1 of val-ues for kn. You can see that by the 4th it-eration, k4 is effectively zero because ofthe limitations of the calculator used to

calculate kn. In practice, this process isterminated when a predefined tolerancehas been reached. In this case it is the pre-cision of a Sharp EL-506D scientific cal-culator.

Now you find that:

16φN=————–––——————————— 1.235

(1.071796769)(1.001292026)(1.000000417)

φN =18.41253384 radians

Working backwards, using the equation:

φn+sin–1(knsinφn)φn–1= ————————

2

You calculate a new table with the corre-sponding values of φn.

From Table 2 you conclude that:

1u=1.235=1.177222534

∫0

—————— dφ√(1–k2sin2φ)

where φ is in radians. Therefore, you cal-culate:

sn(1.235,0.5)=sin(1.177222534)=0.923544441

cn(1.235,0.5)=cos(1.177222534)=0.383491413

dn(1.235,0.5)=√(1–(0.5)2sn(1.235,0.5))=0.876991385

Example 2For this example, assume you want to cal-culate the complete elliptic integral of thefirst kind, K(0.75). By definition:

1K(k)=

0∫

π/2 —————— dφ√(1–k2sin2φ)

You can use the substitutions present-ed in Example 1 with initial values ofk0=0.75 φ0=π/2 and to calculate the in-tegral as:

(1=k1)(1+k2)…(1+kN)K(k)= ————————–—φN

2N

One problem is quickly encounteredthough; the formula used to calculate suc-cessive values of φn contains φn on bothsides of the equation:

knsinφn=sin(2φn–1–φn)

You can solve this by rewriting the afore-mentioned equation as follows, and ap-plying the trigonometric addition formulafor sines:

knsin(φn–φn–1+φn–1)=sin(–(φn–φn–1)+φn–1)

which leads to:

(1+kn)tan(φn–φn–1)=(1–kn)tan(φn–1)

then:



Table 1: Finding N in Example 1.

n kn

0 0.51 0.0717967692 0.0012920263 0.0000004174 0

Table 2: Calculating a new table.

n kn φN

4 0 18.412533483 0.000000417 9.206266742 0.001292026 4.6031334151 0.071796769 2.3009245460 0.5 1.177222534

n kn φN m

0 0.35 1.047197551 01 0.032657963 2.065650903 12 0.000266777 4.131524829 13 0.000000017 8.263049642 24 0 16.52609928 -

Table 3: Values developed in Example 4.

Listing One/***********************************************************************Subroutines for numerical calculation of the elliptic functions sn(u,k),cn(u,k), and dn(u,k). Also included are subroutines to calculate the first and second types of elliptic integrals, F(phi,k), K(k) and E(phi,k).Michael W. Pashea 11-1-04

***********************************************************************/#include <stdio.h>#include <math.h>

// Define PI, Tolerance and Maximum Iterations#define PI 3.14159265358979#define TOL 0.0000000001#define MAX_ITERATIONS 10

// Function prototypes - normally this would be in a header filedouble find_m( double angle_in_rad );double k_next( double k );double phi_next( double phi, double k );double sn( double u, double k );double cn( double u, double k );

1–knφn=φn–1+tan–1[——–tan(φn–1)]1+kn

From this equation, an interesting rela-tionship between φn and φn–1 is seen for thespecial case φ0=π/2. Since tan(π/2)=infintiy,you have:

φn=2φn–1=2nφ0

Which means that K(k)may be calculated as:

(1=k1)(1+k2)…(1+kN)K(k)= ————————–—π

2

Therefore:

(1.203776612)(1.010602528(1.000028104)K(0.75)=————————————–—––—π

2

=1.910988996

Example 3Assume that you need to calculate the com-plete elliptic integral of the second kind,E(π/2,0.3). The following formula for el-liptic integrals of the second kind is due toLegendre, and is presented without proof.

E(φ,k)=aF(φ,k)+b

where:

k2 k1 k1k2 k1k2k3a=1–(—(1+—+——+——–…))2 2 4 8

and:

k1 k2b= —– sinφ1(1+————–— sinφ2+1+k1 (1+k1)(1+k2)

k3————–——— sinφ3…(1+k1)(1+k2)(1+k3)

Again, in the special case that φ=π/2,you find that b=0, and:

π (1+k1)(1+k2)…(1+kN)E(—,k)=a—————————–π2 2

Starting with k=0.3, you have: k1=0.023573301, k2=0.000138963, and k3=0.000000004. Then, a is calculated as: a=0.954469563 and finally, E(π/2,0.3)=1.534833461.

Example 4In this final example, assume you need tocalculate the incomplete elliptic integralof the first kind, F(π/3,0.35). The incom-plete elliptic integral, F(θ,k) is defined as:

1F(θ,k)=

φ∫0

—————dφ√(1–k2sin2φ)

Evaluating this integral is similar to eval-uating the complete elliptic integral in Ex-ample 2, except that the upper limit hasbeen changed. During evaluation, use cau-tion when computing φN because of theperiodic nature of the tangent function.Since tan(φ)=tan(φ−π), the tangent func-tion will always return a value for tan(φN−mπ), where m is an integer chosen suchthat −π/2 ≤ φN−mπ ≤ π/2. To extend therange of the tangent function to includevalues of φN > π/2, you must compute thevalue of m in advance. The formula forcalculating φN given in Example 2 is aug-mented to consider the periodicity of thetangent function:

1–knφn=φn –1+mπ+tan–1[——–tan(φn–1– mπ)]1+kn

where m is an integer chosen such that:

π π– — ≤φN – mπ≤—

2 2

Table 3 is developed using the previ-ous definitions of kn and the augmentedformula for φN. Based on this table:

πu=F(—(0.35)=

3

(1.032657963)(1.000266777)(1.000000017)—————————–—––———––—(16.52609928)

24

From which you calculate that F(π/3,0.35)=1.066897567

ConclusionAt the outset, I observed that the ellipticfunctions sn, cn, and dn are generally notavailable on a calculator. The numerical ex-amples I provide here not only demonstratehow to calculate these functions using a sci-entific calculator, but also provide some in-sight as to why most calculator manufac-turers may have chosen not to include them.

I’ve implemented the algorithm in Ex-ample 1 as a C subroutine (Listing One).What stands out about this algorithm isthat the value of k must be saved either inan array or on a stack for each ascendingiteration because it will be used again lat-er in computing the descending iterationsof phi. Although the algorithm usually con-verges to the desired tolerance within fouror five iterations, each one of these itera-tions would require an additional calcula-tor register just to store an iteration of k.In early calculator designs, registers werea premium and it is possible that the el-liptical functions were just not used oftenenough to justify the added cost. It is like-ly that once hardware designs became es-tablished without the elliptical functions,it became even less cost effective to addthem. Although the other algorithms forthe incomplete and complete elliptic inte-grals also in Listing One do not requirethe iterations of k to be saved, it does notmake sense to include functions for ellip-tic integrals without including the ellipticfunctions.

Finally, the subroutines in Listing Oneare written to demonstrate implementa-tions of the algorithms discussed in theexamples. As a result, there are some lim-itations. First, the maximum number ofiterations has been limited to 10. How-ever, this limit is not enforced within thecode. While four or five iterations are suf-ficient for most values of k, values of kclose to 1 may exceed the maximumnumber of iterations. Second, the specialcases of k=0 and k=1 should be trappedto return the circular and hyperbolic func-tions, respectively. Third, the algorithmsassume that u and phi are real. Imagi-nary or complex arguments have notbeen considered.

ReferencesBowman, F. Introduction to Elliptic Func-tions with Applications, 1953; John Wileyand Sons, NewYork.

Hancock, Harris. Elliptic Integrals, 1917;Dover, New York.

Abramowitz, Milton and Irene Stegun.Handbook of Mathematical Functions, Ap-plied Mathematics Series, Vol. 55, 1964;(Washington: National Bureau of Standards;reprinted 1968 by Dover, New York).

DDJ



double dn( double u, double k );double F_incomplete( double phi_in_rad, double k);double K_complete( double k);double E_incomplete( double phi_in_rad, double k);double E_complete( double k);

// Main program - Just print out the data from the examplesvoid main(){

printf("The elliptic sine sn(1.235,0.5) ");printf("evaluates to: %1.13lf \n", sn(1.235, 0.5) );printf("The elliptic cosine cn(1.235,0.5) ");printf("evaluates to: %1.13lf \n", cn(1.235, 0.5) );printf("The elliptic delta dn(1.235,0.5) ");printf("evaluates to: %1.13lf \n\n", dn(1.235, 0.5) );printf("The incomplete Elliptic integral F(PI/3,0.35) ");printf("evaluates to: %1.13lf \n", F_incomplete(PI/3, 0.35) );printf("The complete Elliptic integral K(0.5) ");printf("evaluates to: %1.13lf \n\n", K_complete(0.5) );printf("The complete Elliptic integral E(0.3) ");printf("evaluates to: %1.13lf \n", E_complete(0.3) );printf("The incomplete Elliptic integral E(PI/6, 0.5) ");printf("evaluates to: %1.13lf \n", E_incomplete(PI/6, 0.5) );

}// double find_m( double angle_in_rad ){

return( floor( angle_in_rad/PI + 0.5));} double k_next( double k ){

k = ( 1 - sqrt( 1-k*k ))/( 1 + sqrt( 1-k*k ));return( k );

}double phi_next( double phi, double k ){

double m;m = find_m(phi); k = k_next( k );phi = phi + (m * PI) + atan( ((1-k)/(1+k))*tan( phi-(m * PI) ) ); return( phi );

}// sn(u,k), cn(u,k) and dn(u,k) as demonstrated in Example 1// These could be easily combined into one subroutine double sn( double u, double k ){

int i,Nmax;double kvalue[MAX_ITERATIONS];Nmax = 1;for(i=0; i<MAX_ITERATIONS; i++){

kvalue[i] = 0.0;}kvalue[0] = k;while( k > TOL ){

k = k_next(k);kvalue[Nmax] = k;u *= 2/(1+k);Nmax++;

}for (i=Nmax-1; i>0; i--){

u = (u + asin( kvalue[i] * sin(u) ))/2;}return(sin(u));

}double cn( double u, double k ){




}for (i=Nmax-1; i>0; i--){

u = (u + asin( kvalue[i] * sin(u) ))/2;}return(cos(u));

}double dn( double u, double k ){




}for (i=Nmax-1; i>0; i--){

u = (u + asin( kvalue[i] * sin(u) ))/2;}return(sqrt(1-(kvalue[0]*kvalue[0]*sin(u))));

}// Incomplete elliptic integral of the first kind as in Example 4double F_incomplete( double phi_in_rad, double k){

double F;F = 1.0;while ( k > TOL ){

phi_in_rad = phi_next(phi_in_rad, k);k = k_next( k );F *= (1 + k)/2;

}return (F * phi_in_rad);

}// Complete elliptic integral of the first kind as in Example 2double K_complete( double k){

double K;K = PI/2;

while ( k > TOL ){k = k_next( k );K *= (1 + k);

}return (K);

}// Incomplete elliptic integral of the second kind. Discussed in // Example 3, but not calculated. double E_incomplete( double phi_in_rad, double k ){

double E, F;double a, aterm, asum;double bterm, bsum;

E = 1.0;F = F_incomplete(phi_in_rad, k );

a = (k*k)/2;aterm = 1.0;asum = 1.0;bterm = k;bsum = 0.0;

while ( k > TOL ){bterm = bterm/k;phi_in_rad = phi_next(phi_in_rad, k);k = k_next( k );aterm *= (aterm*k)/2;asum += aterm; bterm *= k/(1+k);bsum+= bterm*sin(phi_in_rad);

}E = (1-a*asum)*F+bsum;return (E);

}// Complete elliptic integral of the second kind as in Example 3double E_complete( double k ){

double E;double a, aterm, asum;

E = PI/2;a = (k*k)/2;aterm = 1.0;asum = 1.0;

while ( k > TOL ){k = k_next( k );E *= (1 + k);aterm *= (aterm*k)/2;asum += aterm;

}E = (1-a*asum)*E;return (E);

}

DDJ


Positioning queens on a chess boardis one of the classic problems inmathematics and computer science.This long-standing problem goes

back even before Carl Gauss (1777–1855),and is based on the chessboard. It is theproblem of finding all of the ways to po-sition eight queens on the chessboard sothat none of them is under attack by anyother. Remember, the queen can movehorizontally, vertically, and in the two di-agonal directions; for convenience I’ll callthe direction down and to the right (andits reverse) the diagonal direction, thencall the direction up and to the right theantidiagonal direction.

You could approach this problem bylooking at all possible ways of placingeight queens in the 64 available cells—and there are 64!/56! ways to do that (thepermutations of 64 things taken eight ata time), but you don’t have to look at all1.78E+14 permutations. (If you could fig-ure out a way of just generating the com-binations of 64 things taken eight at a time,the number goes down to 4.43E+09.) Youcan use some natural intelligence becauseyou know that the queens can attack hor-izontally. That means that you can onlyhave one queen to a row. Because thereare eight positions on each of the eightrows, the total candidate configurations isreduced to 88—1.68E+07— a nasty num-

ber but not nearly as nasty as the earlierone. You can, however, toss out massivenumbers of these.

As a programming exercise, this is aclassical problem solved by backtracking.You start by positioning a queen in thetop row of the board. As she sits in hercell, you move to the next row and trypositioning a queen there. If you find thata queen above is attacking the queen be-low, you don’t have to proceed any far-ther: All board configurations that includethis start will be illegal. So you simply po-sition that queen on the next available cell.You back up to the earlier row when youhave finished positioning a queen on allavailable cells of the current row and trythe next cell in that row for its queen.

Eight queens is a bit challenging to startwith, so you can rephrase this as the NQueens problem: Positioning N queenson a grid made up of N rows of N squaresto the row. We know that there is no so-lution for the two-queens problem. Nec-essarily, each queen is attacking the oth-er either vertically or diagonally. If youplay around with paper and pencil, you’llsee that there is no possible solution forthe three-queens problem either— thereis necessarily a diagonal attack. So thefour-queens problem is the first one thathas a solution. For pseudocode purposesI’m going to use the C and Java conven-tions of numbering rows from zero on up.This means that when you hit N, you’vealready positioned N queens.

To position a queen in row j:

1. If j has reached N, you have a valid so-lution: Process it as valid.

2. Otherwise, for each column, positionk in this row. a. Position a queen in the (j,k) cell.b. Check for attack by all the queens

above row j.c. If there is no attack, position a

queen in row j+1.

Listing One implements this algorithmin C. The big question is how to performthe check for attack from above, and thisis going to be the source for one of theoptimizations. In Fundamentals of Com-puter Algorithms (Computer Science Press,1978), Ellis Horowitz and Sartaj Sahni gavea simple test based on the assumption thatthe board above the current position is

valid. They chose to represent the boardas a one-dimensional array: Each elementrepresents a row on the board, and thenumber in that element represents the col-umn position that the queen is occupy-ing. That means that you can easily checkfor vertical attack: An earlier row has thesame column position as your current row.They also noticed that you can easilycheck for diagonal attack, as shown in thefollowing pseudocode.

To check for a valid board[ ] filled fromrow 0 to row j:

1. For each row k from 0 to j–1: a. If board[k] = board[j], return False.b. Else if abs(board[row]– board[k])=

(row-k), return false.2. If the loop terminates normally, re-

turn True.

Optimal Queens

A classical problemsolved by backtracking

TIMOTHY ROLFE

Timothy is a professor of computer scienceat Eastern Washington University. He canbe contacted at [email protected].


“Wirth’s algorithmdramatically speedsup the processing”

Listing Two implements this algorithmin C.

You can easily see that the time requiredfor the check increases with the size of theboard. In computer jargon, it is an “order-N” algorithm.

In Algorithms and Data Structures(1986), Niklaus Wirth showed that you canperform the check in constant time, pro-vided that you use some additional arraysto hold information. (His first proposal ofthis technique, however, was in the April1971 issue of the Communications of theACM.) For instance, you can have an ar-ray indicating which columns have alreadybeen filled. To check whether you canposition a queen in a particular column,you just look at that cell of the array tosee if it is in use. Similarly, you can haveadditional arrays to hold information aboutthe diagonals and antidiagonals. You cansee that along the diagonals, the differ-ence of the row and column subscripts isa constant, while along the antidiagonalsthe sum of the row and column subscriptsis a constant. Listing Three implementsWirth’s algorithm in C.

Wirth’s algorithm dramatically speeds upthe processing, requiring only about halfthe time as compared with the code usingHorowitz and Sahni’s order-N validity check.

There is, however, another optimizationpossible. You’re positioning only one queenon a row because of the horizontal attack.You know exactly the same thing aboutthe columns— for each column there canbe only one queen. This means that all suc-cessful solutions are just going to be per-mutations of the column subscripts: Eachsuccessive row has one fewer candidateposition than the previous row. Thus, theproblem space (before backtracking) hascome down from 88 to 8!— from 1.68E+07down to 403E+04. All you have to do isgenerate candidate permutations. As abonus, the validity check becomes a littleeasier, since you only need to check fordiagonal and antidiagonal attacks.

(In Fundamentals of Algorithmics, GillesBrassard and Paul Bratley take note of us-ing this method, but they approach it asexamining all 8! possible permutationswithout combining the method with back-tracking during permutation generation.)

You first fill the entire array with all ofthe column positions— a massively ille-gal configuration with all the queens linedup along the diagonal — but then youmarch the available column positionsthrough each row cell. You can do thiswith swaps: After you have evaluated theinitial partial permutation, you just swapthe value in the front cell with the val-ues in the remaining cells and then eval-uate the resulting partial permutation. Inthe end, you have all the values in thesame order except that the value from

the last cell is now at the front. So youcan regenerate the original configurationby doing a circular leftward rotation inthe array. (This was discussed in my ar-ticle in “Backtracking Algorithms,” DDJ,May 2004.)

To position a queen in row j:

1. If j has reached N, you have a valid so-lution. Process it as valid.

2. Otherwise: a. Loop as k takes on values from j to

N–1:i. Swap entries j and k.ii. Check for attack by all the queens

above row j.iii. If there is no attack, position a

queen in row j+1b. Restore the initial state of the array:i. Save the value in position j.ii. As k goes from j+1 to the end,

move elements from [k] to [k–1].iii. Position the saved value at the end

of the array.

Listing Four implements this algo-rithm in C.

This optimization even more dramati-cally speeds up the processing, requiringonly a fifth of the time as compared withthe code using the row-filling implemen-tation. The comparison code usesHorowitz and Sahni’s order-N validitycheck so that speed-up comes only fromthe permutation vector optimization.

Combining both optimizations, ofcourse, really speeds up the processing.

The final optimizations are to removethe benchmarking superstructure, and thento use inline code for the most frequent-ly performed operations: Marking theBoolean arrays for diagonal attack andperforming the validity checks based on

those Boolean arrays. This roughly dou-bles the speed.

Figure 1 shows the time required in abenchmarking run on a Dell desktop com-puter with a 2-GHz Pentium 4 processorfor boards of sizes 12 up to 18. The Ex-cel workbook and code are available elec-tronically (see “Resource Center,” page 5).While the figure shows the C language re-sults, the ZIP file also includes the Javaimplementations of the benchmarkingcode and of the final optimization code.Note that Figure 1 is using logarithmic scal-ing for its y-axis, and that the x-axis runsfrom 12 to 18.

Rejecting Equivalent SolutionsYou may want to restrict the acceptablesolutions to the unique solutions. Somesolutions possess what is called “rotationalsymmetry”: If you rotate the boardthrough 180 degrees— or perhaps even90 degrees—you end up with exactly thesame configuration. Figure 2 shows twosolutions, one with the 180-degree rota-tional symmetry, the other with the 90-degree rotational symmetry. The num-bering in the figure shows the queens thatare equivalent upon the rotation.

If a solution does not possess such arotational symmetry, then successive ro-tations through 90 degrees generate oth-er solutions that will be discovered as youprocess the solutions. In addition to therotations, there is another symmetry op-eration: reflection in a mirror. By the verynature of the N-Queens problem, a validsolution cannot have mirror symmetry.That means that each valid solution alsohas mirror images that turns up in theprocessing. Figure 3 shows the solutionfor the Five-Queens problem that lacksrotational symmetry (Figure 2 shows the


Figure 1: N-Queens optimization results.

N-Queens Optimization Results

100,000.0

1000.0

10.0

0.1 12 13Number of Queens

No OptimizationPermutation Vector

Wirth's Validity CheckBoth Optimizations

Inline Code Optimization

Sec

onds

14 15 16 17 18

symmetric solution). Consequently, thereare eight solutions that are equivalent, be-ing simply rotations or reflections of aninitial board configuration.

You might think that rejecting equiva-lent solutions would require searchingthrough previously accepted solutions, butthere is an alternative approach that doesnot require saving the earlier solutions.You are representing the board by an ar-ray of column positions. All you need todo is to consider lexicographic orderingof the solutions, thinking of the array asan N-digit number. If you have the rulethat you will only accept the first solutionin this ordering of equivalent solutions,then the rejection of all the rest is straight-forward. For a candidate solution, rotateit by successive 90-degree increments. Ifat any time the result compares as “small-er,” reject the candidate solution. For themirror images, you can generate one mir-ror image and then rotate that throughthree successive 90-degree increments tocheck for the four mirror images.

Listing Five is the C implementation ofthese symmetry checks. The function int-ncmp mimics the Standard C Library func-tion strncmp, but with an array of intsrather than chars.

The timing results in Figure 1 are fromrunning programs that implement this re-

jection of candidate solutions equivalentto the first on rotations and reflections.

You might think that the very fastestgeneration of all solutions to the N-Queens problem would be achieved bystripping out the symmetry checks. Thereare, however, several issues that need tobe faced. Because of the vertical mirrorplane, the optimized version of theNqueens procedure only processes cellsin the initial row in the first half of thearray. If the mirror images are beingcounted separately, then you may thinkyou need to go all the way across, dou-bling the amount of work. You can, how-ever, just go half-way across and then ad-just the result. If N is an even number,simply double the result, but if N is anodd number, then you would over-countthe number of solutions in which thequeen is placed in the center of the firstrow: You will count the mirror imagestwice. So you need to detect that caseand only count those solutions once.

For all this grief, and with a significantloss of information, you get less than a 10percent speed-up.

Table 1 summarizes the benchmarkingresults reflected in Figure 1. It gives thetimes and the time ratios for the bench-marking results running N from 12 to 18in all six cases.

With enough work, you can sometimesreduce the time required for a calculationby a factor of 20. Too bad there isn’t acommercial application for the solutionsto the N-Queens problem!

Parallel Threads in JavaThe Queens problem belongs to the classof problems called “Embarrassingly paral-lel”—The solution to one subproblem(here represented by the queen’s positionin the first row) is completely independentof another subproblem (a different positionin the first row). Consequently, if you havea computer with multiple processors, it iseasy to keep them busy solving the prob-lem in parallel by using Java threads. (See,for instance, my article “Bargain-BasementParallelism,” DDJ, February 2003.)

The only requirement is this: You mustensure that each thread works on a dif-ferent firstrow position, and that thethreads take turns in building the com-plete solution (the total number of solu-tions and of unique solutions). You cando this by taking advantage of the Javakeyword synchronized: In the absence ofwait( ), only one process at a time can ex-ecute the entirety of a synchronizedmethod, so that (in operating-system par-lance) the method represents a critical sec-tion of code.


Listing Six shows the class Board, whichcontrols the initial board positions that thethreads work from and also accumulatesthe sums for total solutions and uniquesolutions. The synchronized method next-Job( ) receives the partial results from athread (receiving zeroes on the first in-vocation) and sends the column positionfrom which the thread should begin thenext job — returning a negative numberas the end-of-job message.

Coarser synchronization (waiting forthread completion) is handled by the Javamethod Thread.join( ). To ease the wait-ing game, you can daisy-chain thread cre-ation: Each thread generates its own childthread until all required threads are ac-tive. The main program can then simplyexecute child.join( ) and be certain that

all threads have completed before it re-sumes execution because every threadwith a child will itself execute child.join( )before terminating.

Listing Seven shows the constructor forthe class WorkEngine that extends Thread.The child thread creation and start is partof the constructor itself. This allows therun( ) method to contain just the logic todialog with the Board object to work sub-

problems and then terminate after re-ceiving the end-of-job message and exe-cuting child.join( ), if appropriate. ListingEight shows that method.

Table 2 shows the statistics from a num-ber of runs on a quad-processor Xeoncomputer under Linux. Since the Xeonprocessor is itself a dual processor, Linuxsees eight available 1.5-MHz processors.From that table, you can easily see some


Description Time Required Ratio With No Optimization

No Optimization 20:59:51 1.000Wirth's Validity Check 8:51:50 2.369Permutation Vector 3:52:36 5.416Both Optimizations 1:54:54 10.965Inline Code Optimization 1:03:28 19.850Without Symmetry Checks 0:58:18 21.609

Table 1: Benchmarking results.

. 1 . . . . . 1 . . .

. . . 2 . . . . . . 1

. . . . . 3 . . 2 . .3 . . . . . 1 . . . .. . 2 . . . . . . 1 .. . . . 1 .

Symmetric on Symmetric on180-degree rotation 90-degree rotation

Figure 2: Examples of rotationalsymmetry in solutions.

Figure 3: Set of solutions for N=5equivalent by symmetry operations.

Original Vertical mirror1 . . . . . . . . 1. . 2 . . . . 2 . .. . . . 3 3 . . . .. 4 . . . . . . 4 .. . . 5 . . 5 . . .

90 degree rotation Antidiagonal mirror. . . . 1 . . 3 . .. 4 . . . 5 . . . .. . . 2 . . . . 2 .5 . . . . . 4 . . .. . 3 . . . . . . 1

180 degree rotation Horizontal mirror. 5 . . . . . . 5 .. . . 4 . . 4 . . .3 . . . . . . . . 3. . 2 . . . . 2 . .. . . . 1 1 . . . .

270 degree rotation Diagonal mirror. . 3 . . 1 . . . .. . . . 5 . . . 4 .. 2 . . . . 2 . . .. . . 4 . . . . . 51 . . . . . . 3 . .

Listing Onevoid Nqueens (int Board[], int Trial[], int Size, int Row){

if (Row == Size)Process(Board, Size);

elsefor (int Col = 0; Col < Size; Col++){

Board[Row] = Col;if ( Valid (Board, Size, Row) )

Nqueens (Board, Trial, Size, Row+1);}

}

Listing Twoint Valid (int Board[], int Size, int Row){

for (int Idx = 0; Idx < Row; Idx++)if ( Board[Idx] == Board[Row] ||

abs(Board[Row]-Board[Idx]) == (Row-Idx) )return 0; // boolean false

return 1; // boolean true}

Listing Threeint Valid (int Board[], int Size, int Row,

int Col[], int Diag[], int AntiD[] ){

int Idx; /* Index into Diag[] / AntiD[] */int Chk; /* Occupied flag */

Chk = Col[Board[Row]];/* Diagonal: Row-Col == constant */

Idx = Row - Board[Row] + Size-1;Chk = Chk || Diag[Idx];

/* AntiDiagonal: Row+Col == constant */Idx = Row + Board[Row];Chk = Chk || AntiD[Idx];return !Chk; /* Valid if NOT any occupied */

}

Listing Fourvoid Nqueens (int Board[],int Size, int Row){

int Idx, Lim, Vtemp;

/* Check for a partial board. */if (Row < Size-1){

if (Valid (Board, Size, Row)Nqueens (Board, Trial, Size, Row+1);

for (Idx = Row+1; Idx < Size; Idx++){

Vtemp = Board[Idx];Board[Idx] = Board[Row];Board[Row] = Vtemp;if (Valid (Board, Size, Row))

Nqueens (Board, Trial, Size, Row+1);}

}/* Regenerate original vector from Row to Size-1: */

Vtemp = Board[Row];for (Idx = Row+1; Idx < Size; Idx++)

Board[Idx-1] = Board[Idx];Board[Idx-1] = Vtemp;

}/* This is a complete board. Final validity check */

else if ( Valid (Board, Size, Row) )Process(Board, Size);

}

performance degradation if you split theproblem into too many parallel threads.By the nature of the problem, some start-ing board configurations take less time tocalculate than others (due to early back-tracking). The main program, however,must wait for the slowest thread to com-plete before it can continue execution.This same waiting means that if the threadsend up computing different numbers ofstarting board configurations, the maincannot continue until the slowest threadis finished.

The Java code and accompanying Excelworkbook are also available (in the “Thread”

folder) in the ZIP file accessible through the“Resource Center” (see page 5).

AcknowledgmentsThe benchmarking runs reported herewere performed during an academic va-cation period on equipment owned by theState of Washington and located withinthe Computer Science Department at East-ern Washington University. This articlecontains material first presented at theSmall College Computing Symposium atAugustana College (Sioux Falls, SouthDakota), 21–22 April 1995. It was pub-lished in SCCS: Proceedings of the 28th

Annual Small College Computing Sympo-sium (1995), 201–10. The article is avail-able online through http://penguin.ewu.edu/~trolfe/SCCS-95/index.html.

For even more optimizations, see http://www.jsomers.com/nqueen_demo/nqueens.html. I have felt it appropriate as I amworking on this article not to workthrough his solution myself, but he reportsa 10-fold speed-up compared with thecode available through the SCCS-95 webpage referenced earlier.

DDJ


Number of Number of UniProc Threaded Speed-Up* Individual Thread Elapsed TimesQueens Threads Time Time

14 1 2.887 2.918 0.99 2.91214 2 2.865 1.662 1.73 1.261 1.65614 3 2.874 1.294 2.22 1.283 1.274 1.23914 4 2.848 1.271 2.26 0.678 1.105 1.264 1.21114 5 2.909 1.008 2.85 0.637 0.840 0.637 0.632 1.00214 6 2.883 1.039 2.77 0.670 0.650 0.640 0.671 0.623 1.03314 7 2.855 0.690 4.17 0.660 0.459 0.654 0.638 0.678 0.641 0.58415 1 19.559 19.332 1.02 19.32615 2 19.642 9.814 2.00 9.807 9.58415 3 19.627 9.028 2.18 8.372 9.021 7.21515 4 19.573 7.734 2.54 5.335 7.690 7.586 4.99715 5 19.640 6.495 3.02 4.005 5.047 3.858 6.479 5.92615 6 19.614 6.302 3.12 3.874 3.888 3.912 6.289 3.881 6.03315 7 19.627 4.584 4.28 4.052 3.895 3.956 4.095 3.673 3.509 4.57815 8 19.817 3.965 4.95 3.617 3.883 3.949 3.774 3.548 3.894 3.918 3.14616 1 122.277 122.852 1.00 122.84616 2 122.381 62.527 1.96 62.520 61.14516 3 122.623 51.567 2.38 36.502 51.560 44.94816 4 122.578 44.849 2.74 44.819 30.552 36.919 37.23316 5 122.757 32.535 3.77 31.549 32.496 32.517 21.761 19.87316 6 122.718 33.373 3.68 32.387 21.243 23.716 33.357 21.909 19.66016 7 122.606 32.415 3.78 24.999 23.875 22.509 24.968 32.361 21.947 19.90916 8 123.360 28.887 4.25 24.026 24.395 27.755 28.855 25.141 22.993 22.309 19.891

Table 2: Thread Timing Results. *Speed-up is the average uniprocessor time divided by the threaded time.

Listing Five/* Check the symmetries. Return 0 if this is not the 1st *//* solution in the set of equivalent solutions; otherwise *//* return the number of equivalent solutions. */int SymmetryOps(

int Board[], /* The fully-populated board */int Trial[], /* Used for symmetry checks */

/* Holds its own scratch space too! */int Size) /* Number of cells in a row/column */

{ int Idx; /* Loop variable; intncmp result */int Nequiv; /* Number equivalent boards */int *Scratch=&Trial[Size]; /* Scratch space */

/* Copy; Trial will be subjected to the transformations */for (Idx = 0; Idx < Size; Idx++)

Trial[Idx] = Board[Idx];

/* 90 degrees --- clockwise (4th parameter of Rotate is FALSE)*/Rotate (Trial, Scratch, Size, false);Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;if ( Idx == 0 ) /* No change on 90 degree rotation */

Nequiv = 1;else /* 180 degrees */{ Rotate (Trial, Scratch, Size, false);

Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;if ( Idx == 0 ) /* No change on 180 degree rotation */

Nequiv = 2;else /* 270 degrees */{ Rotate (Trial, Scratch, Size, false);

Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;Nequiv = 4;

}}

/* Copy the board into Trial for rotational checks */for (Idx = 0; Idx < Size; Idx++)

Trial[Idx] = Board[Idx];/* Reflect -- vertical mirror */

Vmirror (Trial, Size);Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;if ( Nequiv > 1 ) // I.e., no four-fold rotational symmetry{

/* -90 degrees --- equiv. to diagonal mirror */Rotate (Trial, Scratch, Size, true);Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;if ( Nequiv > 2 ) // I.e., no two-fold rotational symmetry{

/* -180 degrees --- equiv. to horizontal mirror */Rotate (Trial, Scratch, Size, true);Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;

/* -270 degrees --- equiv. to anti-diagonal mirror */Rotate (Trial, Scratch, Size, true);Idx = intncmp (Board, Trial, Size);if (Idx > 0) return 0;

}}

/* WE HAVE A GOOD ONE! */return Nequiv * 2; /* Double to handle the mirror images */

}

Listing Sixpublic class Board{ private int nSoln = 0, // Total solutions for this board

nUniq = 0; // Unique solutions, rejecting ones// equivalent based on rotations.

private int size, // Board size AND number of queenslimit, // First row mid-pointnextCol = 0; // Next position to be computed

public Board (int size){ this.size = size;

limit = (size+1) / 2; // Mirror images done automatically}

// Accumulate partial results and assign the next problem.// Synchronized because this is the critical section ---// only one thread allowed in at a time.

public synchronized int nextJob ( int nS, int nU){ nSoln += nS;

nUniq += nU;// If all columns have been assigned, return the exit flag

return nextCol < limit ? nextCol++ : -1;}

}// Return the saved information on total solutions

public int total(){ return nSoln; }

// Return the saved information on unique solutionspublic int unique(){ return nUniq; }

}

Listing Sevenpublic WorkEngine(int size, int nMore, Board info){ this.size = size;

this.info = info;board = new int[size];trial = new int[size];scratch = new int[size];

diagChk = new boolean[2*size-1];antiChk = new boolean[2*size-1];if ( nMore > 0 )

try{ child = new WorkEngine( size, nMore-1, info );

child.start();}catch ( Exception e ){ System.out.println(e); }

elsechild = null;

}

Listing Eightpublic void run(){ int nextCol;

long start = System.currentTimeMillis();

while ( true ) // Will break out on -1 for column posn.{ int row, col;

// On the first call, nTotal and nUnique hold zeroes.nextCol = info.nextJob(nTotal, nUnique);if ( nextCol < 0 )

break;// Empty out counts from the last board processed

nTotal = nUnique = 0;// Generate the initial permutation vector, given nextCol

board[0] = nextCol;for ( row = 1, col = 0; row < size; row++, col++ )

board[row] = col == nextCol ? ++col : col;// Empty out the diagChk and antiChk vectors

for ( row = 0; row < 2*size-1; row++ )diagChk[row] = antiChk[row] = false;

// Mark as inuse the diagonal and antidiagonaldiagChk[size-1-nextCol] = antiChk[nextCol] = true;

// Now compute from row 1 on down.nQueens (1);

}if ( child != null )

try{ child.join(); }catch ( Exception e ){ System.out.println(e); }

}

DDJ


When developing the Top SecretJournal procedure for my Top Se-cret Crypto Gold communicationsand file-encryption program, I

wanted to display the contents of the jour-nal in a tree-view control that users coulduse to select and display a specific pageof the journal. I needed it sorted by twomajor categories—By Date and By Key-word. Within each major category, I need-ed it sorted by keywords and then by dateswithin keywords. Each journal page wouldhave one entry in the By Date category,and up to six entries under the By Key-word category; see Figures 1 and 2.

To do this, I create an index file eachtime a journal is opened, with one entryfor each journal page under the By Datecategory, and up to six entries for eachjournal page in the By Keyword catego-ry. All By Date category entries have theirKeyWord entry set to the single characterA for sorting purposes. Each journal page

has six KeyWord entries that you can en-ter when the page is created or edited, soyou can group pages with similar contenttogether in the tree-view control. I thenhave to sort it on three different fields toget it into the order required to create theentries for the tree-view control. Lookingat the index file structure (Listing One), Ihave to sort on the Type_Entry field, whichis a word value, then on the KeyWordfield, which is a null-terminated string,and finally on the dwCreated field, whichconsists of two dwords, most significantdword followed by least significant dword.(The dwCreated field contains the dateand time the journal page was created asthe number of seconds since midnight 1January 1970, which makes it easy to sort.)

I chose to use the shell sort algorithmand modify it to suit my needs becauseoverall, it usually turns in the best timesfor worst and average cases, with theworst case beating the average case forsorting time. I also needed to modify thealgorithm to sort on byte, word, or dwordarrays, and null-terminated strings, andon a single signed byte, word, or dword.And finally, I wanted to be able to sort inascending or descending order, and for-ward or backward within a field. To ac-complish all of this, the basic shell sort al-gorithm proved easily modifiable toaccomplish the task.

Shell Sort Template StructureTo provide a template for the modifiedshell sort algorithm to follow, I created ashell sort template structure (Listing Two)

to instruct the shell sort algorithm how tosort the data. The first item in the struc-ture is TYPE_SORT, which can be set toSORT_BYTES, SORT_WORDS, SORT_DWORDS for unsigned data; SORT_STRINGS for null-terminated strings; andSORT_SBYTES, SORT_SWORDS, andSORT_SDWORDS for signed data. Sorting

signed data is limited to a single byte,word, or dword in a field; otherwise, itwill not sort properly. The second item inthe structure is COMPARE_DIRECTION,which can be set to FORWARD or BACK-WARD and that lets you compare from thelow-to-the-high address in a field or thehigh-to-the-low address. The third itemin the structure is RECORD_SIZE, whichholds the size of each record to sort. Thefourth item in the structure is COM-PARE_SIZE. If the field you want to com-pare is 1 byte long, or one word long, or

A Multifield Single-Pass Shell Sort AlgorithmAn algorithm for all ofyour sorting needs

MACGREGOR K. PHILLIPS

MacGregor is retired from the U.S. Navyand currently resides in the Philippines.He can be contacted at http://www.topsecretcrypto.com/.


“Most of the shellsort algorithm iswritten in 80x86assembly language”

one dword long, set COMPARE_SIZE to1. If you have a 30-byte field to compare,set it to 30. If you have a two-word fieldto compare, set it to 2; if you have a four-dword field to compare, set it to 4. Thefifth field in the structure is COM-PARE_OFFSET. This is the zero-based off-set of the start of the field within therecord size you want to compare. If youare comparing the field in the FORWARDdirection, this is the zero-based offset ofthe first byte, word, dword, or string with-in the record. If you are comparing thefield in the BACKWARD direction, this isthe offset to the last byte, word, or dwordin the field. You cannot compare null-terminated strings in the backward direc-tion. (Refer to Sort.h, available electroni-cally; see “Resource Center,” page 5.)

Because I want to sort my index fileon three different fields, I extend theshell sort template structure to accom-modate three different sort fields. The fi-nal item in the structure is theEND_MARKER that must be set toSORT_END (–1). This informs the sortalgorithm that there are no more fieldsto sort on. If you have different sortingtasks within a program that call for onlyone or two fields to be sorted, you canplace the end marker in any TYPE_SORTitem, which informs the sort algorithmthat there are no more fields to sort on.You can extend this structure to accom-modate any number of fields.

Filling in a sort template structure iseasy. Following the structure of the jour-nal index file (Listing One), I set up asort template for each of the three fieldsI want to sort on using the shell sort tem-plate structure (Listing Two). The firstfield to sort on is the TYPE_ENTRY field.Since the type of this field is a word,TYPE_SORT is set to SORT_WORDS,COMPARE_DIRECTION is set to FOR-WARD, RECORD_SIZE is set to sizeof(DI-ARYENTRY), COMPARE_SIZE is set to 1,and COMPARE_OFFSET is set to 38 (List-ing Three).

The second field to sort on is KeyWord,which is a null-terminated string up to 130bytes long. Because of this, TYPE_SORTis set to SORT_STRINGS, COMPARE_DI-RECTION is set to FORWARD, RECORD_SIZE remains at sizeof(DIARYENTRY),COMPARE_SIZE is set to 130, and COM-PARE_OFFSET is set to 40. I know the sizeof the KeyWord field is 132, but I leavemyself a little bit of wiggle room.

The third field to sort on is dwCreat-ed, which contains the date and time thejournal entry was created as a 64-bit val-ue in two dwords, the most significantdword first. In the sort template structureTYPE_SORT is set to SORT_DWORDS,COMPARE_DIRECTION is set to FOR-WARD, RECORD_SIZE remains at size-

of(DIARYENTRY), COMPARE_SIZE is setto 2, and COMPARE_OFFSET is set to 172.

The END_MARKER in the sort templatestructure is set to SORT_END, which tellsthe sort algorithm that there are no morefields to sort on.

Shell Sort AlgorithmMost of the shell sort algorithm is writtenin 80x86 assembly language for 80386 orbetter processors. The rest is written inMicrosoft Visual C 6.0 using Windows APIfunctions for the Win32 environment. Mostof the Windows API functions can be con-verted to Standard C functions to makethe sort algorithm more generic.

For the sort algorithm to sort the Key-Word strings in any language, thereby dis-playing the sorted key words in the prop-er order according to the sort criteria foreach language, I chose to use the Win-dows API CompareString function with thelocale set to LOCALE_USER_DEFAULT anddwCmpFlags set to NORM_IGNORECASE.

The one Windows API function thatwould be hard to replace is the Com-pareString function. I say this because, tosort strings for any language, you need alocale that specifies how strings in theuser’s language are sorted. The Com-pareString function gives you this, whilethe Standard C runtime library comparefunction does not.

I follow the standard shell sort algorithmby dividing the list into two partitions ofequal size, comparing each element in thefirst partition with the corresponding ele-ment in the second, and swapping themif necessary. I then divide each of thesepartitions into two partitions and proceedas above. When the partition size reach-es zero, the sort is completed.

What I have added to the shell sort al-gorithm is the ability to sort on different

types of fields, and to sort on more thanone field in a single call to the sort pro-cedure. In the heart of the sort procedureI have included a while (True) loop whereall of this is accomplished (Listing Four).(See Sort.c, available electronically.)

At the beginning of the while (True)loop, I set up the address of the shell sorttemplate structure in edx, and the offsetto the current sort field in the shell sortstructure in ebx. The addresses for the tworecords in the partitions are then placedin esi and edi. The compare offset for thefield we want to sort within the record is


Figure 1: Journal pages can have oneentry in the By Date category.

Figure 2: Journal pages can have up to six entries under the By Keyword category.

then added to esi and edi, then these ad-dresses are placed in temporary storagelocations. The size of the field we are com-paring is then placed in ecx. I then de-termine if we are sorting a signed field bytesting bit 7 of TYPE_SORT in the sort tem-plate structure. I use the btr instruction,which copies the bit to the carry flag andresets (clears) the bit in TYPE_SORT. It isnecessary to clear this bit so you can eas-ily determine the type of field you aresorting on— bytes, words, dwords, orstrings. I then set the signed flag using thesetc instruction, which sets it to 1 if thecarry bit is set, and 0 if not.

Next, I determine the direction to sortthe field in by testing the value in COM-PARE_DIRECTION. If the direction tocompare is BACKWARD, I use the std in-struction to set the direction flag so allsubsequent string instructions will pro-cess down, from high addresses to lowaddresses. I then test TYPE_SORT to de-termine the type of field I am sorting—bytes, words, dwords, or strings. If I amsorting bytes, words, or dwords, I usethe standard rep cmpsb, rep cmpsw, orrep cmpsd instructions to compare thefields. Once the comparison is complete,I reset the direction flag using the cld in-struction, which places the processorback in its default mode of processing

subsequent string instructions up, fromlow addresses to high addresses. The re-sult of the comparison is then stored inthe two temporary variables Above andBelow, respectively. If the comparisonwas performed on unsigned fields, I usethe seta and setb instructions; if on signedfields, I use the setg and setl instructionsand bit 7 is set in the TYPE_SORT fieldusing the bts instruction for the nextcomparison.

If the sorting is performed on null-terminated strings, all of the registers arefirst saved on the stack using the pushadinstruction. The CompareString WindowsAPI function is used with the locale set toLOCALE_USER_DEFAULT and dwCmpFlagsset to NORM_IGNORECASE, which tellsthe function to ignore the case of the let-ters when sorting. The character countvariable for each string to compare is setto –1, which tells the function to treat thestrings as null terminated, and the lengthof each string is calculated automatically.The Above and Below variables are thenset as determined by the outcome of thecomparison, and all of the registers arerestored from the stack using the popadinstruction.

If the compared items are not equal,the procedure jumps out of the loop todetermine if they should be swapped. If

the compared items are equal, a check ismade to see if there are any more fieldsto compare. This is done by adding 20 tothe ebx register, which is the length of oneset of sort parameters in the sort templatestructure. If the start of the next set of sortparameters contains END_SORT, there areno more fields to sort on; therefore, thefunction jumps out of the loop to deter-mine if the items should be swapped,which they will not be because they areequal. If it does not contain END_SORT,we have another field to sort on. The sortfunction then returns to the start of theloop and sets up and compares the nextfield in the two records. This allows theprocedure to sort a set of records on anynumber of fields you want.

For example, if you had a large database of customers and you want to sortthem by zip code, telephone area code,last name, first name, and middle initialin that order, all you have to do is set upthe sort template structure for these fivefields and call the sort procedure to sortthem. One call is all it takes.

To sort a file, you make a call to theSortMyFile function and supply a pointerto the name of the file, a handle to theopened file, the number of bytes (if any)in the file header, a pointer to the sorttemplate structure, and the direction you


Listing One// Journal Entry Index File Structuretypedef struct _DIARYENTRY{

BYTE Date_Time[32]; // Date time as a string.BYTE Year[6]; // Year as string.WORD Type_Entry; // 1 = date entry, 2 = keyword entry.BYTE KeyWord[132]; // Keyword or "A" for date entry.DWORD dwCreated[2]; // Date time created - msd to lsd.SYSTEMTIME st; // local time created.ULARGE_INTEGER uliOffset; // Offset of entry in file.

} DIARYENTRY, *LPDIARYENTRY;

Listing Two// Shell Sort Template Structuretypedef struct _SORT_TEMPLATE{

DWORD TYPE_SORT; // Type of sort.DWORD COMPARE_DIRECTION; // FORWARD or BACKGROUND.DWORD RECORD_SIZE; // Record size.DWORD COMPARE_SIZE; // Sort field size.DWORD COMPARE_OFFSET; // Offset of field in record to sort or -1 for endDWORD TS; // Start of 2nd sort or -1 for end marker.DWORD CD;DWORD RS;DWORD CS;DWORD CO;DWORD TS1; // Start of 3rd sort or -1 for end marker.DWORD CD1;DWORD RS1;DWORD CS1;DWORD CO1;DWORD END_MARKER; // Must be set to -1.

} SORT_TEMPLATE, *LPSORT_TEMPLATE;

Listing Three// Diary Sort TemplateSORT_TEMPLATE DiarySort = {SORT_WORDS,FORWARD,sizeof(DIARYENTRY),1,38,

SORT_STRINGS,FORWARD,sizeof(DIARYENTRY),130,40,SORT_DWORDS,FORWARD,sizeof(DIARYENTRY),2,172,SORT_END};

Listing Four// Sort Algorithm Core.

while(TRUE){

__asm{

// Pointer to sort parameter structure.//.....................................mov edx,dwTempEDX// Offset to current sort parameter in structure.//...............................................mov ebx,dwTempEBX// Setup the records to compare.//..............................mov esi,dwIndexB // Bottom index.mov edi,dwIndexC // Center index.// Point to the part of the record to compare.//............................................add esi,dword ptr [edx][ebx].COMPARE_OFFSETadd edi,dword ptr [edx][ebx].COMPARE_OFFSETmov lpRecordB,esimov lpRecordC,edi// Size of the data to compare. If bytes, the number of bytes; if // words, the number of words; if dwords, the number of dwords.//................................................mov ecx,dword ptr [edx][ebx].COMPARE_SIZE// See if we are doing a signed comparison.//.........................................btr dword ptr [edx][ebx].TYPE_SORT,7setc Signed// Setup the direction for the comparision.

//.........................................cmp dword ptr [edx][ebx].COMPARE_DIRECTION,BACKWARDjne L2// Comparision is performed backwards - high address to low.//..........................................................std// Compare bytes.//...............

L2: cmp dword ptr [edx][ebx].TYPE_SORT,SORT_BYTESjne L3repe cmpsbjmp L5// Compare words.//...............

L3: cmp dword ptr [edx][ebx].TYPE_SORT,SORT_WORDSjne L4rep cmpswjmp L5// Compare dwords.//................

L4: cmp dword ptr [edx][ebx].TYPE_SORT,SORT_DWORDSjne L7 // Default to sort strings.repe cmpsd// Make sure - reset direction flag to low to high address.//.........................................................

L5: cld

// Set the flags depending on if we sorted signed fields or not.//...............................................pushfdcmp Signed,1je L6popfdseta Abovesetb Belowjmp L8

L6: popfdsetg Abovesetl Below// Reset the signed field in the TYPE_SORT parameter for next record.//........................................bts dword ptr [edx][ebx].TYPE_SORT,7jmp L8

// Save all of our registeres.//............................

L7: pushad}// Compare strings using user default settings.//.............................................iCompareResult = CompareString(LOCALE_USER_DEFAULT,NORM_IGNORECASE,

lpRecordB,-1,lpRecordC,-1);Above = 0;Below = 0;if (iCompareResult == CSTR_LESS_THAN){

Below = 1;}else if (iCompareResult == CSTR_GREATER_THAN){

Above = 1;}__asm{

popad// Break if the fields are not equal.//...................................cmp iCompareResult,CSTR_EQUAL

L8: jne CheckSwap// Break if the fields are equal and we have no more fields to sort // on; else continue sorting the record on the next field.//...........................................add ebx,20 // Size of 1 sort parameter.mov dwTempEBX,ebxcmp dword ptr [edx][ebx].TYPE_SORT,SORT_ENDje CheckSwap

}} // while TRUE

DDJ

want to sort the data in, either ASCEND-ING or DESCENDING. An example of thecall for sorting my journal index file is:

SortMyFile((LPTSTR)&szIndexTemp,hIndexTemp,0,&DiarySort,ASCENDING);

This function determines if you haveenough memory to read and sort the filein memory (much faster), or if it will besorted on disk. The dwHeaderBytes vari-able lets you tell the sort procedure to ig-nore the file header when sorting the file.If the file does not have a header, set itto 0. If the file is to be sorted on disk, theSortFileOnDisk function is called. If it isto be sorted in memory, the SortFileIn-Memory function is called. This function

reads the complete file into memory andcalls the ShellSort function to sort the data.Once the data is sorted, you return to theSortFileInMemory function and the sorteddata is written back to disk.

The ShellSort function can also be calledindependently if your program needs tosort a table of data that resides solely inmemory. (See Sort.c and Support.c; avail-able electronically.)

I have commented out some of the codethat contains the error-reporting functionsused by my program. I have replacedthem with simple MessageBox calls. Thisallows you to insert your own error pro-cedures as required by the programs youwrite. This is the reason the SortMyFile

function uses the name of the file beingsorted. My error procedure displays thename so you will know what file an er-ror occurred on.

Currently, the sort procedure is re-stricted to handling 4-GB files or less. Withthe advent of 64-bit computers, it couldeasily be modified to handle 264-byte filesusing 64-bit registers. One way aroundthis restriction is the use of index files.While the index files you want to sortwould be limited to 4 GB or less, the ac-tual database file it indexes could growto be much larger.

DDJ


Agraph is a data structure encoun-tered frequently in algorithmics,whenever we must be able to rep-resent a set of objects and relation-

ships between pairs of objects. A graphconsists of vertices to represent the ob-jects and edges that join to pairs of ver-tices. The vertices of a graph are typical-ly depicted using points or small circlesor squares, and each edge is drawn as aline or curve that connects to the two end-point vertices of the edge. A graph is pla-nar if it can be depicted on a flat surfacein such a way that the vertices are at dis-tinct locations and no two edges intersectexcept at common endpoints.

Planarity is an important category ingraph theory with numerous applications.For example, given a graph representinga circuit with vertices representing logicgates and edges representing wires con-necting them, the circuit can be embed-ded on a chip or circuit board without anyshort-circuits if, and only if, the graph isplanar. Given a graph representing a website with vertices for the web pages andedges for the hyperlinks, if the graph isplanar, then a disambiguated web site map(with no edge crossings) can be present-ed on the computer screen. Moreover, in

these applications, if the graph is not pla-nar, then it is useful to be able to obtaina minimal, nonplanar subgraph so thatsome method can be used to “fix up” orspecially mark an edge crossing, then tryagain to see if the modified graph is pla-nar (and iteratively perform more fixesuntil planarity is achieved).

Interestingly, the issue of how to rendera graph that has been found to be planaris typically treated as a separate problem,in part because the issue of what makesa good drawing is application-dependent;for example, a good layout for a circuitmay not make a pleasing web site maprendition. Moreover, there are numerousgraph-drawing algorithms that are tailoredto satisfy various parameters, such as easeof creation, tightness of physical space us-age, and so forth. (For more information,see Graph Drawing: Algorithms for the Vi-sualization of Graphs, by Ioannis G. Tol-lis, et al., Prentice Hall, 1998.)

In this article, I focus on the underlyingcombinatorial problem of determiningwhether the graph is planar. This includesan examination of the basic ideas and anoverview of a new “edge addition” algo-rithm, which Wendy Myrvold and I jointlycreated. More rigorous technical informa-tion may be found in the scientific paperto appear in the Journal of Graph Algo-rithms and Applications (http://www.cs.brown.edu/publications/jgaa/). I also pre-sent in this article the main functions of areference implementation.

The Effect of Adding an EdgeThe new planarity algorithm adds eachedge of the input graph G to an em-bedding data structure G~ that maintainsthe set of biconnected components thatdevelop as each edge is added. As eachnew edge is embedded in G~, it is possi-ble that two or more biconnected com-ponents will be merged together to forma single, larger biconnected component.

Figure 1 illustrates the graph theoretic ba-sis for this strategy. In Figure 1(a), you seea connected graph that contains a cut ver-tex r whose removal, along with its inci-dent edges, separates the graph into thetwo connected components shown in Fig-ure 1(b). Thus, the graph in Figure 1(a)is represented in G~ as the two biconnectedcomponents in Figure 1(c). Observe thatthe cut vertex r is represented in each bi-connected component that contains it. Ob-serve also that the addition of a single edge

(v, w) with endpoints in the two bicon-nected components results in the single bi-connected component depicted in Figure1(d). Since r is no longer a cut vertex, onlyone vertex is needed in G~ to represent it.

Indeed, Figure 1(d) illustrates the fun-damental operation of the edge additionplanarity algorithm. A single edge bicon-nects previously separable biconnectedcomponents, so these are merged togeth-er when the edge is embedded, resultingin a single larger biconnected componentB. Moreover, the key constraint on thisedge addition operation is that any ver-tex in B must remain on the outside of Bif it must be involved in the future em-bedding of an edge, because new edges

Planarity by Edge Addition

A new approach to afoundational problemin computer science

JOHN M. BOYER

John is a senior product architect and re-search scientist for PureEdge Solutions. Hecan be contacted at [email protected] [email protected].


“Prior methods aremore complexbecause they try todetermine whether awhole vertex or pathcan be added as abatch operation”

are always connected only to the outsideof the partial embedding G~. Hence, a bi-connected component may need to beflipped before it is merged. For example,the lower biconnected component in Fig-ure 1(d) was merged but also flipped onthe vertical axis from r to w to keep y onthe outside, which is called the “externalface” of the embedding.

Overview of the AlgorithmThis section assumes you know a littleabout how to perform a depth-first search(DFS) on a graph, that each vertex is as-signed an index according to when it isvisited, and that it identifies a DFS treewithin the graph. All edges in the graphthat are not in the DFS tree are called“back edges.” An embedding data struc-ture G maintains a collection of combi-natorial planar embeddings of the bicon-nected components that develop as eachedge from the input graph G~ is added.Each biconnected component has a “root”vertex that has the least depth-first indexin the biconnected component, and is thecut vertex separating the biconnected com-ponent’s vertices from DFS ancestors ofthe root. In the embedding structure, theroot r of each biconnected component isrepresented by a virtual vertex, typicallydenoted with a single quote ('). A cut ver-tex is represented by a virtual vertex ineach biconnected component for which itis the root, and by a nonvirtual vertex inthe biconnected component in which itdoes not have the least depth-first index.

The planarity algorithm begins by firstadding each depth-first search (DFS) treeedge (p, c) to G~ as a Singleton bicon-nected component containing the edge(p',c). Then, the vertices are processed inreverse order of their depth-first indicesto add the back edges between each ver-tex v and its descendants. Biconnectedcomponents are merged at their cut ver-tices as the edge that biconnects them isembedded.

In a depth-first search numbering, theDFS ancestors are numbered before theirdescendants, so the reverse iteration byDFI means that while processing a vertexv, the back edges from v to its descendantsare added, but the back edges from theancestors of v to both v and its descen-dants will not be added until future steps.Thus, while processing vertex v, all de-scendants of v with back edge connectionsto the DFS ancestors of v must be kept onthe external face (the outside) because thealgorithm only adds edges incident to ver-tices that are kept on the external face (thereason for this is tied up with the proof ofcorrectness in the journal paper).

The detailed operation of this process-ing model is supported by the followingdefinitions: A vertex x is “externally ac-

tive” if the input graph G contains a backedge (u, x) where u is a DFS ancestor ofthe current vertex v being processed, orif x has a DFS child cx in a separate bi-connected component Bcx

in the embed-ding G~ and the input graph G contains aback edge (u, w) where u is a DFS an-cestor of the current vertex v being pro-cessed and w is in the DFS subtree root-ed by cx. Similarly, a vertex w is “pertinent”in step v if there exists a back edge (v, w)in the input graph G that has not beenembedded in G~, or if w has a DFS childcw in a separate biconnected componentBcw

in the embedding G~, and the inputgraph G contains a back edge (v, z) wherez is in the DFS subtree rooted by cw, and(v, z) has not yet been embedded in G~.A “pertinent biconnected component” con-

tains a pertinent vertex. A vertex or bi-connected component is “internally ac-tive” if it is pertinent but not externally ac-tive. A “stopping vertex” is externally activebut not pertinent. The implementation ofthese definitions are very fast, involvingonly constant time per query due to thecreation and careful maintenance of a fewsimple lists and values at each vertex.

The WalkdownAgain, a main loop processes each vertexv in descending depth-first index order.To process v, the back edges between vand its descendants are embedded. Foreach DFS child c of v, a procedure called“Walkdown” embeds the back edges be-tween v and descendants of c. In a depth-first manner, the Walkdown traverses from

http://www.ddj.com Dr. Dobb’s Journal, May 2005 43Dobbs_0505.indd 1 09.03.2005 13:47:53 Uhr

pertinent vertices to pertinent child bi-connected components along the exter-nal face paths until a descendant d directlyadjacent to v is found. The pertinent ver-tices encountered along the way are called“separation ancestors” of d, and the Walk-down collects them on a separation an-cestor stack. Once d is found, biconnect-ed components are merged at the verticeson the separation ancestor stack, and theedge (v, d) is added to biconnect them.

The Walkdown performs two traversalsfrom v through c to descendants of c. Thefirst traversal proceeds in a counterclock-wise direction, and biconnected compo-nents are merged and back edges addeduntil the traversal is terminated by en-countering a stopping vertex x. Thesecond traversal performs the same op-erations only in the clockwise direction,until it is also terminated by a stoppingvertex y.

Overall Effect of the Walkdown It is helpful to see an example of the over-all effect of a Walkdown on the entire per-tinent subgraph (the collection of perti-nent biconnected components). Figure 2shows the state immediately before theWalkdown of an example set of bicon-nected components (ovals), externally ac-tive vertices (squares), and descendantendpoints of unembedded back edges(small circles). The dark ovals are inter-nally active, the shaded ovals are perti-nent but externally active, and the lightovals are nonpertinent. Figure 3 showsthe result of the Walkdown processingover the example of Figure 2.

The first traversal Walkdown descendsto vertex c, then biconnected componentA is selected for traversal because it is in-ternally active, whereas B and G are per-tinent but externally active. The backedges to vertices along the external faceof A are embedded and then the traver-sal returns to c. Biconnected componentB is chosen next, and it is flipped so thattraversal can proceed toward the internallyactive vertex in B. The back edge to thevertex in B is embedded and the root ofB is merged with c. Then, the traversalproceeds to the nonvirtual counterpart ofthe root of D, which is externally activebecause D is externally active. The traver-sal continues to the root of D, then to thenonvirtual counterpart of the root of Erather than the nonvirtual counterpart ofthe root of F; both are externally active,but the path to the former is selected be-cause it is pertinent. Traversal proceeds tothe internally active vertex in E to embedthe back edge, at which time D and E be-come part of the biconnected componentrooted by v'. Finally, traversal continuesalong E until the first traversal is halted bythe stopping vertex x.

The second Walkdown traversal pro-ceeds from v' to c to the biconnected com-ponent G, which is flipped so that the in-ternal activity of H, I, and J can be resolvedby embedding back edges. The backedges to I and J are embedded betweenthe first and second back edges that areembedded to H. The bounding cycles ofthe internally active biconnected compo-nents are completely traversed, and thetraversal returns to G. Next, the roots ofM, N, and O are pushed onto the mergestack, and N is also flipped so that the tra-versed paths become part of the newproper face that is formed by embeddingthe back edge to the vertex in O. Finally,the second traversal is halted at the stop-ping vertex y.

Generally, the first traversal embeds theback edges to the left of tree edge (v',c),and the second traversal embeds the backedges on the right. As this occurs, the ex-ternally active parts of this graph are kept


Figure 1: (a) A cut vertex r; (b) removing r results in more connectedcomponents; (c) the biconnected components separable by r; (d) when edge (v,w) is added, r is no longer a cut vertex (by flipping the lower biconnectedcomponent, y remains on the external face).

(a)

v

x

r

y

w

v

x

y

w

(b)

v

x

r

y

w

(c)

v

r

y

w

x

(d)

on the external face by permuting the chil-dren of c (for example, selecting A beforeB and G) and by biconnected componentrotations. The internally active biconnect-ed components and pertinent vertices aremoved closer to v' so that their pertinencecan be resolved by embedding backedges. The internally active vertices andbiconnected components become inactiveonce their pertinence is resolved, whichlets them be surrounded by other backedges as the Walkdown proceeds.

Using the ImplementationTo reify the conceptual overview pro-vided in this article, I’ve also provided areference implementation (available elec-tronically; see “Resource Center,” page5) that shows the structures used to rep-resent a graph as well as the basic algo-rithms, such as depth-first search, oper-ating over those structures. It is easy touse the implementation to learn a lotmore about edge addition planarity be-cause the implementation is written inplain C, highly structured, and copious-ly commented.

Of course, the code itself is organizedto make it easy to use the implementationto begin solving planarity-related prob-lems. The main header file, graph.h, con-tains declarations of all the functions avail-able. Here are the main ones to consider:

• gp_New( ) allocates an empty graphstructure and returns a pointer to it.

• gp_Free( ) frees a graph data structureand nulls out the pointer. Take care topass the address of the pointer returnedby gp_New( ).

• gp_InitGraph( ), given N, allocates with-in a graph structure enough memory forN vertices and 3N edges.

• gp_AddEdge( ) allows the addition of asingle edge to a previously created andinitialized graph.

• gp_Write( ) writes the graph to a file inan adjacency list format.

• gp_Read() allocates and initializes a graph,then adds edges to it according to the con-tent of a given file (preferably one creat-ed in the style produced by gp_Write).

• gp_Embed( ) is the main function thatreceives a graph and rearranges it toproduce either a combinatorial planarembedding or a minimal nonplanar sub-graph.

• gp_SortVertices( ) can be used aftergp_Embed() to recover the original num-bering of the graph that appeared; forexample, in the input file. By default,gp_Embed( ) assumes that the graphshould remain with its depth-first searchnumbering, not the original numbering.

There are a number of prior linear-timeplanarity algorithms. However, this newmethod is both simpler and faster than pri-or approaches. The prior methods are

more complex in part because they try todetermine whether a whole vertex or pathcan be added as a batch operation. The Cimplementation provided with this articleis intended to be immediately accessible,yet interest in the speed and simplicity ofthe method has already resulted in sever-al independent implementations. For someexamples, see the Magma computationalalgebra system (http://magma.maths.usyd.edu/magma/) and the Gravisto open-source Java toolkit (http://www.gravisto.org/), which also implements graph visu-alization methods.

DDJ


Figure 2: Before the Walkdown on v'.

B G M Nc

v'

A H O

I

K Q

E

F

C

P

Dx

L

J y

Figure 3: After the Walkdown on v'.

B G M Nc

v'

H O

I

K Q

E

F

A

Px

L

J

y

D

C

In spite of the ubiquity of computers,data is often processed in batches. His-torically, batch processing was a wayto optimize the use of rare and precious

machine resources. People prepared dataoffline, and when they did their batch pro-cessing, they were assured of consuming100 percent of the available computing re-sources. As the price of computing felland the number of computers grew, westopped optimizing machine time. For ev-idence, consider the number of processorcycles spent running screen savers.

Some transactions are very complex,leading us to create hybrid applications.These have an interactive front end thatworks as quickly as necessary to make thehuman users productive. Additionally, theyhave a batch-oriented back end to processtransactions slowly, when no one is tap-ping their foot waiting for results.

Another kind of batch operation is ananalytical process where you may group

data into bands. Sometimes you only wantthe bottom 10 customers or the vendorswith nearest delivery dates. In this case,we don’t want to see every vendor deliv-ery or every customer—we only want tosee the few that we can take action to help.

In both cases, we are selecting the top(or bottom) N rows from a database. Thisis a common SQL specialization that is nota part of the standard feature set of SQL.There are two definitions for the ordinarySELECT statement: single row and multi-ple rows (meaning all rows). There arevarious vendor-specific extensions to SQLto facilitate getting only a limited numberof rows. However, there are a number ofproblems with these constructs.

For small tables, there isn’t a terribly bigproblem here. The time required to tablescan a few thousand rows is minimal. Ifyou only want the first 100, you can’t re-ally avoid loading many of the remaining900 into cache because of read-aheadstrategies. If the table is reasonably well-used, the entire thing may be lurking incache anyway, making the processing timenegligible.

In very large processing contexts, suchas financial institutions or utilities, or ina data-warehousing context, the num-ber of rows may stretch into the hun-dreds of thousands, making a table scanfar too expensive. In this article, we fo-cus on large tables, with over 100,000rows of data.

The problem we examine is how topick out just N rows from a very largetable as efficiently as possible. In this case,efficiency will be the elapsed time to re-

turn the entire set of rows. Our aim is tominimize the kinds of overheads thatcreep into this kind of problem when wetake too many details as given parts of thetechnology, not as choices we make increating a solution.

Order By and ROWNUMMany RDBMS products can fudge in a rownumber as part of the query results. InOracle, the ROWNUM column provides anumber for each row returned. In DB2and MySQL, there is a LIMIT clause thatcan be used to return only selected rowsthat comprise a batch.

The canonical example in Oracle issomething like Example 1. This has the

Processing Rows In Batches

Solving the really bigdatabase problems

STEVEN F. LOTT AND ROBERT LUCENTE

Steven and Robert are database develop-ers. They can be contacted at [email protected] and [email protected], re-spectively.


“In very largeprocessing contexts,the number of rowsmay stretch into thehundreds ofthousands”

unfortunate side effect of sorting the en-tire table prior to locating the top N rows.Further, the temporary storage used forsorting can be a scarce resource. If mul-tiple client processes are sorting con-currently, resources can be exhausted,leading to individual application crash-es as well as making the system unre-sponsive.

Because Oracle assigns the row num-bers before sorting, we have to use theinline view technique in Example 1. Thedata is sorted by the view, then rownumbers are assigned for picking off abatch of rows. A large sort is done be-fore any rows are returned. For systemslike MySQL and DB2 with LIMIT claus-es, the syntax is slightly simpler (seeExample 2), but the performance is nobetter.

Some vendors offer a RANK analyticfunction that can be used with the OVERclause to provide ranking values in com-plex queries. As with the previous use ofROWNUM or LIMIT clauses, this will query,sort, and process the entire table beforereturning a useful result set.

One of the most common methods forselecting the top N rows from a result setis to write a short piece of applicationcode. The idea is that the application pro-gram does not fetch rows beyond the Nrequired rows. Unfortunately, the sort stillgets done. Example 3 is handy for mea-suring the magnitude of the problem. Thedelay due to sorting is seen by measuringthe time to process each row. The first callto next reflects the time to do the sort andfetch the first batch of rows into cache.The remaining calls to next run extreme-ly quickly.

One Scan OnlyTo avoid sorting all of the rows in thetable, you need to focus your sorting onjust a subset of those rows. Example 4 ex-emplifies this approach. This seeds aTreeMap with keys and data elements forthe first N rows. The first key is the min-imal key in the set. As rows are fetched,they are compared against the first key.If the new row’s key is less than or equalto this first of N keys, the new row is ig-nored. If the new row’s key is greater thanthis first of N keys, then the previous min-imum key is discarded and this new rowis inserted.

The sorting is reduced to tree insertionfor a subset of the rows. If the rows arein a random order, then approximatelyhalf will be ignored. If the rows are indescending order, then all but the first Nwill be ignored. If the rows are ascend-ing, then the N-row subset will have eachof the table rows sorted in and then re-moved.



Example 1: Batches via ORDER BYand ROWNUM.

SELECT X, Y, ZFROM(SELECT X, Y, Z FROM SOMETABLEORDER BY X) TEMPVIEW

WHERE ROWNUM <= 100;

Example 2: Batches via LIMIT.

SELECT X, Y, Z FROM SOMETABLEORDER BY XLIMIT 100;

void topNRows( Connection db ) {String someQuery = "SELECT X, Y, Z FROM SOMETABLE ORDER BY X";int N= 100;Statement firstN= db.prepareStatement( someQuery );ResultSet rs= firstN.executeQuery();for( int i= 0; i != N && rs.next(); ++i ) {

// process the row}// assert (N rows processed) or (no more rows)rs.close();firstN.close();

}

Example 3: Java partial fetch.

import java.util.*;

/*** Collects the N rows with the largest key * values from a ResultSet. This version is * hard-wired to expect a String in column 1, * and sort key is an int in column 2.* @author slott*/public class TopNRows {

/** Number of rows to keep */int keep;/** Set of top N rows */ (1)TreeMap topRows;/*** Creates a new instance of TopNRows.* @param keep int top number of values to keep.*/public TopNRows( int keep ) {

this.keep= keep;}/*** Scans the given result set, checking column 2, the integer key,* for the largest value.* @param rs ResultSet to scan*/public void scan( ResultSet rs ) {

topRows= new TreeMap();while( rs.next() ) {

Integer rowKey= new Integer(rs.getInt(2)); (2)if( topRows.size() <= keep ) {

topRows.put( rowKey, rs.getString(1) );continue;

}Integer minKeepKey= (Integer)topRows.firstKey(); (3)if( rowKey.compareTo( minKeepKey ) > 0 ) {

topRows.remove( minKeepKey );topRows.put( rowKey, rs.getString(1) );

}}

}/*** Returns an iterator over the selected results.* These will be Map.Entry objects. * The key will be column 2 values, transformed to Integers. * The entry will be column 1 values, still Strings.* @return Iterator over the Map.*/public Iterator iterator( ) {

return topRows.entrySet().iterator();}

}

Example 4: Fast Java fetch.

Where a full sort of R rows is O(Rlog(R)),this sort is somewhat smaller, and isO(Rlog(N)). When the table is very largeand the batch size is very small, this differ-ence can be profound. When we are try-ing to find a 100 row batch from a 100,000row table, the processing is cut in almost1/3 in the worst possible case.

Point #1 in Example 4 is the TreeMapinto which we’ll accumulate, at most, keeprows of keys and data values. Since the mapisn’t full at Point #2, we load in key valuesand data values. In this case, we only usea single column that we presume is a pri-mary key. If necessary, we could constructand insert a more complex object.

Once the map is full at Point #3, wecompare each new row against the small-est of the keys in topRows. If the new rowis larger, it should be in the collection; wedrop the lowest value from the collection.Since this is a balanced red-black tree, thetree will be reordered and balanced asnecessary after this operation.

Fork and SplitAs with many such problems, the realproblem is not to get the top N rows. Thereal problem is to break up a stream ofincoming transactions into batches. Thecurrent architecture puts transactions intoa large table, then repeatedly picks outbatches for processing. This is, of course,slow, and the initial narrow focus on aspecific performance issue leads to thisinvestigation.

The table and the consequent tablescans are not an essential feature of theproblem. The table is merely persistencefor transactions until they have been ful-ly processed. It is little more than a re-covery mechanism in case the transactionprocessor fails. From this point of view,it is a slow and expensive version of a re-liable message queue, and any number ofvendors provide reliable message deliverywithout using a large, expensive relation-al database.

Realizing this, there is a canonical UNIXsolution that works without any overheadat all:

1. A transaction enters the system. It is in-serted into the database for reliabilitypurposes. It is also written to a tem-porary file for processing.

2. When the temporary file has a batch ofN records in it, the system closes the fileand then forks a subprocess to executethe batch of transactions in that file.

3. The subprocess is given a file of trans-actions. Each transaction in the file canalso be found on the database. Whenthe subprocess has finished processingthe transaction, it deletes the transac-tion from the database, clearing out thereliability information.

In the event that the transaction pro-cessors crash, the unprocessed transac-tions in the database can be queried andsplit up into transaction files, and trans-action subprocesses created. The only dif-ference from normal operations is the ori-gin of the transactions.

ConclusionIt is often difficult to fix the real problem.In this case, the overall architecture wasan effort at inventing a reliable messagequeue.

However, since the application is work-ing in production, it’s difficult to replaceit with a simpler, more focused, and reli-able message queue product. Instead,we’re stuck with Home-Brewed ReliableMessage Queuing (HBR-MQ).

Rather than fix the performance prob-lems, we can mask them by using a high-performance application that fetches batch-es of transactions without the overheadof table sorts. The additional complexityraises the cost of maintenance, and makesit more difficult to diagnose system prob-lems. However, it can run considerablyfaster.

In any application with large data vol-umes, sorting should be avoided. In somecases, the entire DBMS should be avoid-ed. Where the database can’t be avoided,the performance of each processing stepmust be considered to assure that solu-tions are reliable and scalable.

DDJ


There are three methods by which ge-ographical knowledge is conveyed indigital form. The first (and usuallybest) method is via a set of “round-

Earth” coordinates of point, line, and areaabstractions of objects defining theirwhereabouts on or near the Earth’s ellip-soidal (or at least spherical) surface. Ge-ographic information in this form can beprocessed with a computer to yield ac-curate quantitative analysis and can sub-sequently be cast into any number of flat-panel or paper displays for presentationto human eyes.

A second (and second best) method isvia a set of “flat Earth” coordinates forthe same surface object abstractions, buthaving already fictitiously located themon a flat Earth, using any one of manydifferent planar projections. Geographicinformation of this type permits less ac-curate quantitative analysis and has ob-vious limitations at the edges of the pro-jection.

Digital maps based on either of theforegoing methods are referred to as “vec-tor” maps.

The least desirable method for the con-veyance of geographical information is viadigital images from the start. Very littleanalysis is possible, and about all that canbe reasonably done with them is to usea computer to present images (or parts ofthem) for viewing by human eyes, possi-bly with the addition of some application-specific content.

Why, then, would it be worthwhile toconsider a file format suited specificallyto “image” maps? Primarily, because dig-ital map images can easily be obtained byscanning existing paper maps.

While there is an ever-increasing avail-ability of high-quality vector geography,there are many instances where vectordata either isn’t available or can be ob-tained only at prohibitive cost. However,a paper map often may be obtained andscanned inexpensively. And even thoughit is only an image, a scanned map maycontain information not readily availablein vector form.

There are other sources of digital mapimages. Sometimes images photographedfrom flying or orbiting platforms may bedistributed in raw form without ever hav-ing undergone the typically labor-intensivephotogrammetric and interpretive pro-cesses that are used by cartographers toturn aerial photographs or satellite imagesinto paper maps.

Regardless of image source, the abilityto associate each pixel of a map imagewith actual ground coordinates will usu-ally exist. Knowledge of this associationenables many practical computer appli-cations such as GPS navigation. We pro-pose that such applications can benefitgreatly from the use of the cross-platformfile format and C library components pre-sented here. With that in mind, when wetalk about computer maps for the rest ofthis article, we are referring to the thirdtype— image maps.

Maps as Computer ImagesMaps are a peculiar subspecies of com-puter images for two main reasons:

• Unlike business graphics or pho-tographs, map images are usually quitelarge. After all, they are often derivedfrom large conventional maps. As such,

their pixel count is likely to be quitehigh, requiring many megabytes of ex-ternal storage. Furthermore, an appli-cation may require that imagery frommore than one map be stitched togeth-er, making the combined image evenlarger.

• There are two fundamentally differentkinds of computer images. There arethose that are composed of lines, let-tering, and areas of uniform color hav-ing sharp outlines. Then there are pho-tographs or other material obtained byimaging devices having continuouslychanging colors and other acquisitionartifacts. A map as an image can haveelements of both types; for instance, ashaded background with line work andlettering on top of it. Imagery of the firsttype is commonly stored as .png, whichis lossless, and imagery of the second

TileShare:Maps as Computer Images

A cross-platform fileformat and library forscanned map images

HRVOJE LUKATELA AND JOHN RUSSELL

Hrvoje and John are principals ofGeodyssey Limited (http://www.geodyssey.com/). Hrvoje can be contacted viahttp://www.lukatela.com/hrvoje/ and Johnvia [email protected].


“Tilesets use onlytwo data types—8-bit bytes and 4-byte integers”

type as .jpg, which is lossy. A truly gen-eral and comprehensive map image fileformat should thus accommodate bothtypes of compression.

Design ObjectivesTo be useful for the storage of map im-ages, a graphical file format should be:

• Capable of handling very high pixel-count collections. Even a single mapsheet of 700×800 mm (28×32 inches)scanned at the resolution of 150 dpi be-comes a 4200×4800 pixel bitmap.

• Randomly accessible. The granularityof compression commonly used ingraphical file formats takes in a com-plete scan line— the width of the en-tire image. However, since we assumethat not the whole file, but only a smallportion of the file will be on screen atany one time, then the “atom” of com-pression should be a two-dimensionalsubset of the file (a tile) instead of asingle-dimensional subset (a scan line).

• Organized for access efficiency. Com-pressed tiles that are close to each otheron the ground should, as much as pos-sible, be close to each other in the file.

• Memory mappable. Since map imagefiles are unlikely to change in the handsof the end user, and possibly be ac-cessed directly from read-only media,applications should be able to accessthem in a memory-mapped mode.

• Computationally tractable. The directand inverse mapping between each pix-el and its geographic coordinates mustbe simple to calculate.

• Cross-platform usable. Map image in-formation on disk or CD should ac-commodate processing in both Big- orLittle-endian byteorder architectures,with a minimum of overhead. Also, thefiles should be processable reasonablywell on hardware platforms having nofloating-point hardware, as is often thecase for hand-held PDA devices or em-bedded applications.

Tileset File Format and Supporting ComputationsIn this article, we present two essential in-gredients for an effective map image man-agement and access system— an efficientcross-platform file-format specification anda library of C source code to perform thenecessary geometric and indexing com-putations. With these two elements (andsome rudimentary application code), youshould be able to easily jump-start an ap-plication project that makes use of scannedmaps or other geographic imagery.

In addition to these two software com-ponents, we assume that two other wide-ly available software components will beused—zlib compression (for .png-like im-

ages) and JPEG library compression (for.jpeg images).

We expect that this technology will beused primarily for maps of relatively largescale (1:25k–1:250k), maps that cover rea-sonably large local, possibly regional, butnot continental areas, and that do not ex-tend into very high latitudes (60+ degrees).

We describe the file format only in gen-eral terms. The details are specified in thetileset.h header file (available electroni-cally; see “Resource Center,” page 5). Weexamine in some detail a basic form ofthe file— one that has only a single im-age layer of 24 bits per pixel (bpp) color,compressed using the lossless zlib com-pression.

The file is a binary replication of C lan-guage structures. It consists of three mainblocks of information:

• A general information header.• A table of values required to index tile

storage and perform tile- or pixel-to-ground direct and inverse mapping.

• The compressed image tiles themselves,organized in a latitude/longitude matrix(see Figure 1).

The header provides basic informationabout the file— the size of each tile, thetile rows and columns count, and so on.

Next to the file header is a structure thatdefines how the tiles and pixels are relat-ed to geographic locations. At present,only one such structure— a descriptionof the “rectangular tileset projection”— isprovided. All but one longitude in the fileare relative to the tileset midlongitude val-ue specified therein. This significantly re-duces the number of instances in the codewhere the cyclic nature of the longitudedomain must be taken into account.

Following the geometry-defining struc-ture, there is an array of latitude valuesthat specify the latitude at each tile bound-ary, starting at the South boundary of thesouthern-most row of tiles, and progress-ing North, ending with the latitude of theNorth boundary of the northern-most rowof tiles.

Next in the file is an index table pro-viding a file-relative pointer to each com-pressed tile and its size.

Dual EndianAll of the structures and tables just de-scribed contain only two types of data,single-byte characters and 4-byte integers.To make it possible to process the file asa read-only file in memory-mapped mode,and to avoid reading complete headerstructures and tables into memory andpossibly having to reverse the byte orderin the integers, the foregoing is repeatedagain in its opposite-endian form. (Itmakes no difference which endian ver-

sion occurs first and which second.) Incomparison with the tile storage itself, thisoverhead is insignificant.

Compressionzlib is used to compress and decompressthe tiles with 24-bpp pixels. Again, thefile design provides for “two-layer” maps:background compressed with lossy JPEGcompression, and linework and letteringin a png-like layer with transparency (seethe details in tileset.h).

Indexing: Morton OrderOne of TileShare’s design principles as-sumes that large tilesets exist as disk filesand that only a small number of tiles—presumably those that are currently visi-ble on the screen or in the window—willbe brought into the main memory asneeded. Depending on the type of theexternal storage used, access can be con-siderably faster if the tiles that are closeto each other geographically are close toeach other in the file. Since tiles cover atwo-dimensional (latitude/longitude)domain and the file is essentially a one-dimensional object, this is possible onlyto a limited extent. Out of many differentarrangements, the Morton Order, modifiedso that rectangular arrangements with dif-ferent x and y tile counts are accommo-dated, is used in tilesets. The details canbe found in the extensive commentary ofthe ts_Morton( ) function in tileset.c (avail-able electronically). The function is pro-vided for use in programs that composetilesets, and will not, as a rule, be requiredby the programs that use tilesets once theyhave been created.

Angular Coordinate Encoding & The “Discrete” Mercator ProjectionTilesets use only two data types—8-bitbytes and 4-byte integers. The angularmeasure of latitudes and longitudes aretherefore mapped from/to their canonicalforms, which are signed real numbers inradian measure into 4-byte signed inte-gers using two macros in tileset.h:TS_I4OfAng( ) and TS_AngOfI4( ). Theground resolution of geographic coordi-nates in the integer representation is inthe order of a centimeter: more than suf-ficient even for tilesets on the high-end ofthe anticipated scale range.

All tiles in a tileset are rectangular, hav-ing identical pixel dimensions and thesame longitudinal “width.” This means thata tile width measured on the groundchanges continuously not only from onetile row to another, but also from one east-west line of pixels within the tile to thenext. To preserve the similarity of objectshapes between the map and the ground,the design calls for the tile mid-latitudewidth-to-height aspect ratio on the ground


to be the same as the equivalent aspectratio on the map image. The tile width inpixels must be a multiple of eight, whileheight is open. The application choice willdepend on the scope of the applicationand the anticipated use of the tileset. Anexample implementation might use 128pixels in latitude (height) by 256 pixels inlongitude (width). All tiles in any one east-west row have the same north-south ex-tent on the ground.

This arrangement results in a differentlatitude extent for each row of tiles; eachpixel has approximately the same extenton the ground in both east-west andnorth-south directions. Indeed, as the tilesare getting smaller, the geometry of theplanar tileset pixel x-y system and geo-graphic coordinates becomes more andmore similar to the Mercator projection(which is itself a rigorous mathematicalexpression of such an arrangement withinfinitesimally small tiles). It is thereforereasonable to call this mapping a “dis-crete” (as opposed to “continuous”) Mer-

cator projection. Inside each tile, the lat-itudes and longitudes can be assumed tobe linearly proportional to the pixel co-ordinates: If tiles have a relatively lownorth-south extent on the ground (in theorder of several kilometers) and relative-ly low pixel count in north/south direc-tion (hundreds), the error made by linearinterpolation of latitudes will remain sub-pixel. There is no error as a result of lin-ear interpolation of longitudes: These are,by definition, proportional to the x pixelcoordinates.

A program that constructs a tileset re-quires two simple geodetic computa-tions— finding the length of an arc ofmeridian between two latitudes and thelength of an arc of the parallel of latitudebetween two longitudes. If the scale ofthe tileset is large and the tiles are small,the simple spherical (in C) form suffices:

meridianArcLength = earthRadius * (latitudeNorth -

latitudeSouth);parallelArcLength =

earthRadius * cos(latitude) *(longitudeEast - longitudeWest);

Along with many other geodetic compu-tations, the ellipsoidal version of these com-putations can be found in the HipparchusLibrary (see http://www.geodyssey.com/).

TileShare Source CodeIf you want to create and/or use tilesetfiles in your applications, you’ll need thetwo C language files, tileset.h and tileset.c(available electronically), that contain thedefinition of all file structures and the Clanguage source code for the functionsthat perform data transformation, geo-metric computations, and the indexing re-quired to access tiles. The code can beused as source for a library, or the func-tions in it may be individually included inthe application source code. Since one ofthe main advantages of the TileShare sys-tem is its cross-platform capability, thecode is intentionally presented and madeavailable with no dependency on a par-ticular development environment. Both ofthese files are provided to application de-velopers free of charge and free of copy-right restriction in the hope that they willenable the development of robust cross-platform geographical applications. Attri-bution of this design to Geodyssey Limit-ed (http://www.geodyssey.com/) isappreciated.

Sample Viewers To jump-start the development of appli-cations that use tilesets, we have preparedprograms for three different platforms—the ubiquitous Win32, Mac OS X, andWindows CE. All three are available in fullsource form.


Figure 1: Tileset File Layout (forrectangular projection, single losslessimage, 24-bpp color).

All three programs open a read-only,cross-platform tileset file in “memory-mapped” mode and use only the mostfundamental components of the graphicalAPI of their respective platforms. This isintentional, as the purpose of the code isprimarily to be reused in the developmentof more comprehensive applications (seeFigure 2).

The viewer for Win32 includes the com-ponents required for communication witha GPS device and provides a “movingmap” display of the tileset. Unlike mostGPS navigation applications that usurp the

COM port and the device connectedthrough it for their exclusive use, this pro-gram consists of two independent exe-cutables: a GPS monitor and a tileset dis-play program.

Any number of programs that follow asimple data exchange protocol can beexecuting simultaneously— perhaps mul-tiple viewers displaying different tilesetsfor the same locale or providing naviga-tional computations and numeric datadisplay.

Example Project on the WebTo provide end users with representativescanned map material and allow develop-ers to quickly become operational, wehave also provided a suite of Win32 pro-grams that comprise a “Tileset Builder’sToolkit.” The suite is freely available fordownload at http://www.geodyssey.com/tileshare/. Figure 3 presents a snapshot ofthe tileset build process using this toolkit.

DDJ


Figure 3: Tiled bitmap compile process.

Figure 2: Development of more comprehensive applications in different environments.

As software environments becomemore complex and programs getlarger, it becomes more and morenecessary to find ways to reduce

code duplication and scattering of knowl-edge. While simple code duplication iseasy to factor out into functions or meth-ods, more complex code duplication isnot. For example, if a method needs tobe wrapped in a transaction, synchronizedin a lock, or have its calls transmitted toa remote object, there often is no simpleway to factor out a function or method tobe called, because the part of the behav-ior that varies needs to be wrapped in-side the common behavior.

A second and related problem is scat-tering of knowledge. Sometimes a frame-work needs to be able to locate all of aprogram’s functions or methods that havea particular characteristic, such as “all ofthe remote methods accessible to userswith authorization X.” The typical solu-tion is to put this information in externalconfiguration files, but then you run therisk of configuration being out of syncwith the code. For example, you mightadd a new method, but forget to also addit to the configuration file. And of course,you’ll be doing a lot more typing, becauseyou’ll have to put the method names inthe configuration file, and any renamingyou do requires editing two files.

So no matter how you slice it, duplica-tion is a bad thing for both developer pro-ductivity and software reliability—whichis why Python 2.4’s new “decorator” fea-ture lets you address both kinds of du-plication. Decorators are Python objectsthat can register, annotate, and/or wrap aPython function or method.

For example, the Python atexit modulecontains a register function that registers acallback to be invoked when a Python pro-gram is exited. Without the new decora-tor feature, a program that uses this func-tion looks something like Listing One(a).

When Listing One(a) is run, it prints“Goodbye, world!” because when it exits,the goodbye( ) function is invoked. Nowlook at the decorator version in ListingOne(b), which does exactly the samething, but uses decorator syntax instead—an @ sign and expression on the line be-fore the function definition.

This new syntax lets the registration beplaced before the function definition,which accomplishes two things. First, youare made aware that the function is anatexit function before you read the func-tion body, giving you a better context forunderstanding the function. With such ashort function, it hardly makes a differ-ence, but for longer functions or methods,it can be very helpful to know in advancewhat you’re looking at. Second, the func-tion name is not repeated. The first pro-gram refers to goodbye twice, so there ismore duplication— precisely the thingwe’re trying to avoid.

Why Decorate?The original motivation for adding deco-rator syntax was to allow class methodsand static methods to be obvious to some-one reading a program. Python 2.2 intro-duced the classmethod and staticmethodbuilt- ins, which were used as in ListingTwo(a). Listing Two(b) shows the samecode using decorator syntax, which avoidsthe unnecessary repetitions of the methodname, and gives you a heads-up that aclassmethod is being defined.

While this could have been handled bycreating a syntax specifically for class orstatic methods, one of Python’s primarydesign principles is that: “Special casesaren’t special enough to break the rules.”

That is, the language should avoid havingprivileged features that you can’t reuse forother purposes. Since class methods andstatic methods in Python are just objectsthat wrap a function, it would not makesense to create special syntax for just twokinds of wrapping. Instead, a syntax wascreated to allow arbitrary wrapping, an-notation, or registration of functions at thepoint where they’re defined.

Many syntaxes for this feature were dis-cussed, but in the end, a syntax resem-bling Java 1.5 annotations was chosen.Decorators, however, are considerably

more flexible than Java’s annotations, asthey are executed at runtime and can havearbitrary behavior, while Java annotationsare limited to only providing metadataabout a particular class or method.

Creating DecoratorsDecorators may appear before any func-tion definition, whether that definition ispart of a module, a class, or even con-tained in another function definition. Youcan even stack multiple decorators on thesame function definition, one per line.

But before you can do that, you first needto have some decorators to stack. A deco-rator is a callable object (like a function) thataccepts one argument— the function beingdecorated. The return value of the decora-tor replaces the original function definition.See the script in Listing Three(a), which pro-duces the output in Listing Three(b), demon-strating that the mydecorator function iscalled when the function is defined.

For the first example decorator, I hadit return the original function object

Python 2.4 DecoratorsReducing codeduplication andconsolidatingknowledge

PHILLIP EBY

Phillip is the author of the open-sourcePython libraries PEAK and PyProtocols,and has contributed fixes and enhance-ments to the Python interpreter. He is theauthor of the Python Web Server GatewayInterface specification (PEP 333). He canbe contacted at [email protected].


“Decorators arePython objects thatcan register,annotate, and/orwrap a Pythonfunction or method”

unchanged, but in practice, it’s rare thatyou’ll do that (except for registration dec-orators). More often, you’ll either be an-notating the function (by adding attributesto it), or wrapping the function with an-other function, then returning the wrapper.The returned wrapper then replaces theoriginal function. For example, the scriptin Listing Four prints “Hello, world!” be-cause the does_nothing function is replacedwith the return value of stupid_decorator.

Objects as DecoratorsAs you can see, Python doesn’t care whatkind of object you return from a decora-tor, which means that for advanced uses,you can turn functions or methods intospecialized objects of your own choosing.For example, if you wanted to trace cer-tain functions’ execution, you could usesomething like Listing Five.

When run, Listing Five prints “entering”and “exiting” messages around the “Hel-lo, world” function. As you can see, a dec-orator doesn’t have to be a function; it canbe a class, as long as it can be called witha single argument. (Remember that inPython, calling a class returns a new in-stance of that class.) Thus, the traced classis a decorator that replaces a function withan instance of the traced class.

So after the hello function definition inListing Five, hello is no longer a function,but is instead an instance of the tracedclass that has the old hello function savedin its func attribute.

When that wrapper instance is called(by the hello( ) statement at the end of thescript), Python’s class machinery invokesthe instance’s __call__( ) method, whichthen invokes the original function betweenprinting trace messages.

Stacking DecoratorsNow that we have an interesting decora-tor, you can stack it with another decora-tor to see how decorators can be combined.

The script in Listing Six prints "Called with<class '__main__.SomeClass' >", wrappedin “entering” and “exiting” messages. Theordering of the decorators determines thestructure of the result. Thus, someMethod isa classmethod descriptor wrapping a tracedinstance wrapping the original someMethodfunction. So, outer decorators are listed be-fore inner decorators.

Therefore, if you are using multiple dec-orators, you must know what kind of ob-ject each decorator expects to receive, andwhat kind of object it returns, so that youcan arrange them in a compatible wrap-ping order, so that the output of the in-nermost decorator is compatible with theinput of the next-outer decorator.

Usually, most decorators expect a func-tion on input, and return either a functionor an attribute descriptor as their output.

The Python built- ins classmethod, stat-icmethod, and property all return attributedescriptors, so their output cannot bepassed to a decorator that expects a func-tion. That’s why I had to put classmethodfirst in Listing Four. As an experiment, tryreversing the order of @traced and @class-method in Listing Four, and see if you canguess what will happen.

Functions as DecoratorsBecause most decorators expect an actu-al function as their input, some of themmay not be compatible with our initial im-plementation of @traced, which returnsan instance of the traced class. Let’s re-work @traced such that it returns an ac-tual function object, so it’ll be compatiblewith a wider range of decorators.

Listing Seven provides the same func-tionality as the original traced decorator,but instead of returning a traced objectinstance, it returns a new function objectthat wraps the original function. If you’venever used Python closures before, youmight be a little confused by this function-in-a-function syntax.

Basically, when you define a functioninside of another function, any undefinedlocal variables in the inner function willtake the value of that variable in the out-er function. So here, the value of func inthe inner function comes from the valueof func in the outer function.

Because the inner function definition isexecuted each time the outer function iscalled, Python actually creates a newwrapper function object each time. Suchfunction objects are called “lexical clo-sures,” because they enclose a set of vari-ables from the lexical scope where thefunction was defined.

A closure does not actually duplicatethe code of the function, however. It sim-ply encloses a reference to the existingcode, and a reference to the free variablesfrom the enclosing function. In this case,that means that the wrapper closure is es-sentially a pointer to the Python bytecodemaking up the wrapper function body,and a pointer to the local variables of thetraced function during the invocationwhen the closure was created.

Because a closure is really just a nor-mal Python function object (with somepredefined variables), and because mostdecorators expect to receive a functionobject, creating a closure is perhaps themost popular way of creating a stackabledecorator.

Decorators with ArgumentsMany applications of decorators call forparameterization. For example, say youwant to create a pair of @require and @en-sure decorators so that you can record amethod’s precondition and postcondition.

Python lets us specify arguments with ourdecorators; see Listing Eight. (Of course,Listing Eight is for illustration only. A full-featured implementation of preconditionsand postconditions would need to be alot more sophisticated than this to dealwith things like inheritance of conditions,allowing postconditions to access be-fore/after expressions, and allowing con-ditions to access function arguments byname instead of by position.)

You’ll notice that the require( ) decora-tor creates two closures. The first closurecreates a decorator function that knows theexpr that was supplied to @require( ). Thismeans require itself is not really the deco-rator function here. Instead, require returnsthe decorator function, here called deco-rator. This is very different from the previ-ous decorators, and this change is neces-sary to implement parameterized decorators.

The second closure is the actual wrap-per function that evaluates expr whenev-er the original function is called. Try call-ing the test( ) function with differentnumbers of arguments, and see what hap-pens. Also, try changing the @require lineto use a different precondition, or stackmultiple @require lines to combine pre-conditions. You’ll also notice that @re-quire(expr="len(__args)==1") still works.Decorator invocations follow the same syn-tax rules as normal Python function ormethod calls, so you can use positionalarguments, keyword arguments, or both.

Function AttributesAll of the examples so far have been thingsthat can’t be done quite so directly withJava annotations. But what if all you re-ally need is to tack some metadata ontoa function or method for later use? For thispurpose, you may wish to use functionattributes in your decorator.

Function attributes, introduced inPython 2.1, let you record arbitrary val-ues as attributes on a function object. Forexample, suppose you want to track theauthor of a function or method, using an@author( ) decorator? You could imple-ment it as in Listing Nine. In this exam-ple, you simply set an author_name at-tribute on the function and return it, ratherthan creating a wrapper. Then, you canretrieve the attribute at a later time as partof some metadata-gathering operation.

Practicing “Safe Decs”To keep the examples simple, I’ve beenignoring “safe decorator” practices. It’seasy to create a decorator that will workby itself, but creating a decorator that willwork properly when combined with oth-er decorators is a bit more complex. Tothe extent possible, your decorator shouldreturn an actual function object, with thesame name and attributes as the original


function, so as not to confuse an outerdecorator or cancel out the work of an in-ner decorator.

This means that decorators that simplymodify and return the function they weregiven (like Listings Three and Nine), arealready safe. But decorators that return awrapper function need to do two morethings to be safe:

• Set the new function’s name to matchthe old function’s name.

• Copy the old function’s attributes to thenew function.

These can be accomplished by adding justthree short lines to our old decorators.(Compare the version of @require in List-ing Ten with the original in Listing Eight.)

Before returning the wrapper function,the decorator function in Listing Tenchanges the wrapper function’s name (bysetting its __name__ attribute) to matchthe original function’s name, and sets its_ _dict_ _ attribute (the dictionary con-taining its attributes) to the original func-tion’s __dict__, so it will have all the sameattributes that the original function did. Italso changes the wrapper function’s doc-umentation (its _ _doc_ _ attribute) tomatch the original function’s documenta-tion. Thus, if you used this new @re-quire( ) decorator stacked over the @au-thor( ) decorator, the resulting functionwould still have an author_name attribute,even though it was a different functionobject than the original one being deco-rated.

Putting It All TogetherTo illustrate, I’ll use a few of these tech-niques to implement a complete, usefuldecorator that can be combined with oth-er decorators. Specifically, I’ll implementan @synchronized decorator (ListingEleven) that implements Java-like syn-chronized methods. A given object’s syn-chronized methods can only be invokedby one thread at a time. That is, as longas any synchronized method is executing,any other thread must wait until all thesynchronized methods have returned.

To implement this, you need to have alock that you can acquire whenever themethod is executing. Then you can cre-ate a wrapping decorator that acquiresand releases the lock around the originalmethod call. I’ll store this lock in a_sync_lock attribute on the object, auto-matically creating a new lock if there’s no_sync_lock attribute already present.

But what if one synchronized methodcalls another synchronized method on thesame object? Using simple mutual exclu-sion locks would result in a deadlock inthis case, so we’ll use a threading.RLockinstead. An RLock may be held by onlyone thread, but it can be recursively ac-quired and released. Thus, if one syn-chronized method calls another on thesame object, the lock count of the RLocksimply increases, then decreases as themethods return. When the lock countreaches zero, other threads can acquirethe lock and can, therefore, invoke syn-chronized methods on the object again.

There are two little tricks being donein Listing Eleven’s wrapper code that areworth knowing about. First, the code usesa try/except block to catch an attribute er-ror in the case where the object does notalready have a synchronization lock. Sincein the common case the lock should ex-ist, this is generally faster than using anif/then test to check whether the lock ex-ists (because the if/then test would haveto execute every time, but the Attribute-Error will occur only once).

Second, when the lock doesn’t exist, thecode uses the setdefault method of the ob-ject’s attribute dictionary (its __dict__) toeither retrieve an existing value of_sync_lock, or to set a new one if therewas no value there before. This is impor-tant because it’s possible that two threadscould simultaneously notice that the ob-ject has no lock, and then each would cre-ate and successfully acquire its own lock,while ignoring the lock created by the oth-er! This would mean that our synchro-nization could fail on the first call to a syn-chronized method of a given object.

Using the atomic setdefault operation,however, guarantees that no matter howmany threads simultaneously detect theneed for a new lock, they will all receive


Listing One(a)import atexit

def goodbye():print "Goodbye, world!"

atexit.register(goodbye)

(b) import atexit

@atexit.registerdef goodbye():

print "Goodbye, world!"

Listing Two(a) class Something(object):

def someMethod(cls,foo,bar):print "I'm a class method"

someMethod = classmethod(someMethod)

(b) class Something(object):

@classmethoddef someMethod(cls,foo,bar):

print "I'm a class method"

Listing Three(a) def mydecorator(func):

print "decorating", funcreturn func

print "before definition"@mydecorator

def some_function():print "I'm never called, so you'll never see this message"

print "after definition"

(b) before definitiondecorating <function some_function at 0x00A933C0>after definition

Listing Fourdef stupid_decorator(func):

return "Hello, world!"@stupid_decoratordef does_nothing():

print "I'm never called, so you'll never see this message"print does_nothing

Listing Fiveclass traced:

def __init__(self,func):self.func = func

def __call__(__self,*__args,**__kw):print "entering", __self.functry:

return __self.func(*__args,**__kw)finally:

print "exiting", __self.func@traceddef hello():

print "Hello, world!"hello()

Listing Sixclass SomeClass(object):

@classmethod@traced

def someMethod(cls):print "Called with class", cls

Something.someMethod()

Listing Sevendef traced(func):

def wrapper(*__args,**__kw):print "entering", functry:

return func(*__args,**__kw)finally:

print "exiting", funcreturn wrapper

Listing Eightdef require(expr):

def decorator(func):def wrapper(*__args,**__kw):

assert eval(expr),"Precondition failed"return func(*__args,**__kw)

return wrapperreturn decorator

@require("len(__args)==1")def test(*args):

print args[0]test("Hello world!")

Listing Ninedef author(author_name):

def decorator(func):func.author_name = author_namereturn func

return decorator

@author("Lemony Snicket")def sequenceOf(unfortunate_events):

passprint sequenceOf.author_name # prints "Lemony Snicket"

Listing Tendef require(expr):

def decorator(func):def wrapper(*__args,**__kw):

assert eval(expr),"Precondition failed"return func(*__args,**__kw)

wrapper.__name__ = func.__name__wrapper.__dict__ = func.__dict__wrapper.__doc__ = func.__doc__return wrapper

return decorator

Listing Elevendef synchronized(func):

def wrapper(self,*__args,**__kw):try:

rlock = self._sync_lockexcept AttributeError:

from threading import RLockrlock = self.__dict__.setdefault('_sync_lock',RLock())

rlock.acquire()try:

return func(self,*__args,**__kw)finally:

rlock.release()wrapper.__name__ = func.__name__wrapper.__dict__ = func.__dict__wrapper.__doc__ = func.__doc__return wrapper

class SomeClass:"""Example usage"""@synchronizeddef doSomething(self,someParam):

"""This method can only be entered by one thread at a time"""

DDJ

the same RLock object. That is, one set-default( ) operation sets the lock, then allsubsequent setdefault( ) operations receivethat lock object. Therefore, all threads endup using the same lock object, and thusonly one is able to enter the wrappedmethod at a time, even if the lock objectwas just created.

ConclusionPython decorators are a simple, highly cus-tomizable way to wrap functions or meth-ods, annotate them with metadata, or reg-ister them with a framework of some kind.But, as a relatively new feature, their fullpossibilities have not yet been explored,

and perhaps the most exciting uses haven’teven been invented yet. Just to give yousome ideas, here are links to a coupleof lists of use cases that were posted tothe mailing list for the developers work-ing on the next version of Python: http://mail.python.org/pipermail/python-dev/2004-April/043902.html and http://mail.python.org/pipermail/python- dev/2004-April/044132.html.

Each message uses different syntax fordecorators, based on some C#-like alter-natives being discussed at the time. Butthe actual decorator examples presentedshould still be usable with the current syn-tax. And, by the time you read this arti-

cle, there will likely be many other usesof decorators out there. For example,Thomas Heller has been working on ex-perimental decorator support for the ctypespackage (http://ctypes.sourceforge.net/),and I’ve been working on a completegeneric function package using deco-rators, as part of the PyProtocols sys-tem (http://peak.telecommunity.com/PyProtocols.html).

So, have fun experimenting with deco-rators! (Just be sure to practice “safe decs,”to ensure that your decorators will playnice with others.)

DDJ


Many software applications areabout to be turned upside-downby the transition of CPUs from sin-gle to multicore implementations.

In new designs, software developers willbe tasked with keeping multiple cores busyto avoid leaving performance on the floor.In legacy designs, you will be faced withthe challenge of having single-threadedapplications run efficiently on multiplecores. Programs will need to serve upcode threads that can be dished out toseveral cores in an efficient manner. Codethreading breaks up a software task intosubtasks called “threads,” which run con-currently and independently.

Threaded code has been the rule in anumber of applications for some time,such as storage area networks. UtilizingHyperthreading Technology from Intel (thecompany I work for), storage applicationsdeploy concurrent tasks to take advantageof CPU idle time, or CPU underutilized re-sources, such as when data is retrievedfrom slow memory. Therefore, tools and

expertise are already available to writeand optimize threaded code. Operatingsystems such as Windows XP, QNX, andsome distributions of the Linux kernelhave been optimized for threading andare ready to support next-generation pro-cessors.

Embedded applications are not inher-ently threaded and may require some soft-ware development to prepare for multi-core CPUs. In this article, I examine themotivation of CPU vendors to move tomulticores, the corresponding softwareramifications, and the impact on embed-ded system developers.

CPU Architecture TerminologyThe terminology to describe various in-carnations of CPU architecture is complex.Figure 1 depicts the physical renditions ofthree different multithread technologies.

Figure 1(a) shows a dual-processor con-figuration. Two individual CPUs share acommon Processor Side Bus (PSB) thatinterfaces to a chipset with a memory con-troller. Each CPU has its own resources toexecute programs. These resources in-clude CPU State registers (CS), InterruptLogic (IL), and an Execution Unit (EU),also called an Arithmetic Logic Unit (ALU).

Figure 1(b) depicts HyperthreadingTechnology (HT), which maintains twothreads on one physical CPU. Each threadhas its own CPU State registers and Inter-rupt Logic, while the Execution Unit isshared between the two threads. Thismeans the execution unit is time-sharedby both threads concurrently, and the ex-ecution unit continuously makes progress

on both threads. If one thread stalls, per-haps waiting for an operand to be re-trieved from memory, the execution unitcontinues to execute the other thread, re-sulting in a more fully utilized CPU. Al-though Hyperthreading Technology is im-plemented on a single physical CPU, theoperating system recognizes two logicalprocessors and schedules tasks to eachlogical processor.

A dual-core CPU is shown in Figure1(c). Each core contains its own dedicat-ed processing resources similar to an in-dividual CPU, except for the ProcessorSide Bus, which may be shared betweenthe two cores.

All of these CPU implementations re-quire threaded code to fully employ theircomputing potential. In the future, thedual-core CPU model will be extended toquad-core, containing four cores on a sin-gle piece of silicon.

MultithreadedTechnology & Multicore ProcessorsPreparing yourself fornext-generation CPUs

CRAIG SZYDLOWSKI

Craig is an engineer for the InfrastructureProcessor Division at Intel. He can be con-tacted at [email protected].


“All of these CPUimplementationsrequire threadedcode to fully employtheir computingpotential”

Why the Move to Dual-Core?Ever increasing clock speed is creating apower dissipation problem for semicon-ductor manufacturers. The faster clockspeeds typically require additional tran-sistors and higher input voltages, result-ing in greater power consumption.

The latest semiconductor technologiessupport more and more transistors. Thedownside is that every transistor leaks asmall amount of current, the sum of whichis problematic.

Instead of pushing chips to run faster,CPU designers are adding resources,such as more cores and more cache toprovide comparable or better perfor-mance at lower power. Additional tran-sistors are being leveraged to create morediverse capability, such as virtualizationtechnology or security features as op-posed to driving to higher clock speeds.These diverse capabilities ultimately bringmore performance to embedded appli-cations within a lower power budget.Dual-core CPUs, for example, can beclocked at slower speeds and suppliedwith lower voltage to yield greater per-formance per watt.

Parallelism and Its Software ImpactMulticore processor implementation willhave a significant impact on embeddedapplications. To take advantage of multi-core CPUs, programs require some levelof migration to a threaded software mod-el and necessitate incremental validationand performance tuning. There are ker-nel or system threads managed by the op-erating system and user threads main-tained by programmers. Here I focus onuser threads.

You should choose a threaded pro-gramming model that suits the parallelisminherent to the application. When there area number of independent tasks that run inparallel, the application is suited to func-tional decomposition. Explicit threading isusually best for functional decomposition.When there is a large set of independentdata that must be processed through thesame operation, the application is suitedto data decomposition. Compiler-directedmethods, such as OpenMP (http://www.openmp.org/), are designed to expressdata parallelism. The following exampledescribes explicit threading and compiler-directed methods in more detail.

To exploit multicore CPUs, you identi-fy the parallelism within your programsand create threads to run multiple tasksconcurrently. The vision-inspection sys-tem in Figure 2 illustrates the concept ofthreading with respect to functional anddata parallelism. You must also decideupon which threading models to imple-ment— explicit threading or compiler-directed threading.

The vision-inspection system in Figure2 measures the size and placement of leadsonto a semiconductor package. The sys-tem runs several concurrent function tasks,such as interfacing to a human, control-ling a conveyer belt, capturing images ofthe leads, processing the lead images, anddetecting defects and transferring the datato a storage area network. These tasksrepresent functional parallelism becausethey run at the same time, execute as in-dividual threads, and are relatively inde-pendent. These tasks are asynchronous toeach other, meaning they don’t start andend at the same time.

The advantage of threading these func-tional tasks is that the inspection applica-tion doesn’t lock up when other tasks orfunctions run, so the machine operator,for example, experiences a more respon-sive application.

The processing of the semiconductorpackage images is well-suited to dataparallelism because the same algorithmis run on a large number of data ele-ments. In this case, the defect detectionalgorithm processes arrays of pixels bylooping and applying the same inspec-tion operation to independent sets of pix-els. Each set of pixels is processed by itsown thread.

For either functional or data parallelism,you can write explicit threads to instructthe operating system to run these tasksconcurrently. An explicit thread is pur-posely coded instructions using threadlibraries such as Pthreads or Win32threading APIs. You are responsible forcreating threads manually by encapsu-lating independent work into functionsthat are mapped to threads. Like memo-ry allocation, thread creation must alsobe validated by you.

Although explicit threads are generalpurpose and powerful, their complexitymay make compiler-directed threading amore appealing alternative. An exampleof compiler-directed threading is Open-MP, which is an industry standard set ofcompiler directives. In OpenMP, you use

pragmas to describe parallelism to thecompiler; for example:

#pragma omp parallel for private(pixelX,pixelY)

for (pixelX = 0; pixelX < imageHeight; pixelX++)

{for (pixelY = 0; pixelY <

imageWidth; pixelY++){

newImage[pixelX,pixelY] =ProcessPixel (pixelX, pixelY, image);}

}

The pragma omp says this is an op-portunity for OpenMP parallelism. Theparallel key word tells the compiler to cre-ate threads. The for key word tells thecompiler the iterations of the next for loopwill be divided amongst those threads.The private clause lists variables that needto be kept private for each thread to avoidrace conditions and data corruption.

The compiler creates the spawnedthreads as in Figure 3. Notice the spawnedthreads are all created and retired at thesame time, somewhat resembling the tinesof a fork. There is an explicit parent-childrelationship that is not necessary withthreaded libraries. This is called a “Fork-Join” model and is a required characteris-tic for OpenMP parallelism. OpenMP prag-mas are less general than threaded libraries,but they are less complex because thecompiler creates the underlying parallelcode for the multiple threads. OpenMP issupported by various compilers allowingthe threaded code to be transportable,whereas threaded libraries typically haveallegiance to specific operating systems.

Parallelism DebugWhether threads are created explicitly, bycompiler directive, or by any othermethod, they need to be tested to ensureno race conditions exist. With a race con-dition, you have mistakenly assumed aparticular order of execution, but didn’tguarantee that order. In embedded appli-cations, processes are often asynchronous,


Figure 1: Three multithread technologies.

Dual Processor Dual Core

CPU State

Interrupt Logic

Execution Units(ALUs)

HyperthreadingTechnology (HT)

CS

IL

CS

IL


CPU State

Interrupt Logic


Processor Side Bus Processor Side Bus

CS

IL

CS

IL

EU(ALUs)

EU(ALUs)

Processor Side Bus

(a) (b) (c)

which means a bug may be dormant dur-ing validation testing and permits the codeto work nearly all the time.

A race condition may be caused by astorage conflict. Two threads could beoverwriting a particular memory locationor a thread may presume another threadcompleted its work on a particular vari-able, leading to the use of corrupt data.Access to common data must be syn-chronized to avoid data loss. Synchro-nization can be implemented with a sim-ple status word to indicate the state of thedata called a “semaphore.” A thread takescontrol of the data by writing “0” to thestatus word, whereas writing “1” to thestatus word releases control, allowing an-other thread to access the variable. As em-bedded applications are often interruptdriven, it may be useful to implement aprotected read-modify-write sequence toguarantee a thread’s operations on a vari-able are not disturbed by another processsuch as an interrupt service routine.

There are sophisticated tools availableto test for race conditions. The Intel ThreadChecker (http://www.intel.com/ids/) is anautomated runtime debugger that checksfor storage conflicts and looks for placeswhere threads may lock or stall. It identi-fies memory locations that are accessed byone thread, followed by an unprotectedaccess by another thread, which exposesthe program to data corruption. TheThread Checker is a dynamic analysis tooland is, therefore, dataset dependent. Assuch, if the dataset does not exercise cer-

tain program flows, the tool is not capa-ble of checking that code portion. For em-bedded applications, it is important to cre-ate a dataset that simulates the relevantasynchronous processes.

Finding race conditions can be very dif-ficult and time consuming. Thread Check-er can easily find these conflicts, evenwhen the conflict is generated by code in-stances in different call stacks and manythousands of lines apart.

Performance TuningOnce the code has been tested and veri-fied to be executing correctly, performanceoptimization may begin. Performance op-timization should be limited to the criticalpath of execution. There is little return fortuning code with no impact to overall sys-tem performance.

To maximize the performance of mul-ticore CPUs, it is often necessary to en-sure that the workload is balanced be-tween the cores. Load imbalance will limitparallel inefficiency and scalability becausesome processor resources will be idle.

Synchronization can also limit perfor-mance by creating bottlenecks and over-head. Although synchronization helpsguarantee data integrity, it serializes theprogram flow. Synchronization requiressome threads to wait on other threads be-fore the program flow can proceed, re-sulting in idle processor resources.

To assist performance tuning, the IntelThread Profiler (http://www.intel.com/ids/)lets you check load balance, lock con-

tention, synchronization bottlenecks, andparallel overhead. This tool can drill downto source code for threads created byOpenMP or thread libraries. The profileridentifies the critical path in the programand indicates the processor utilization bythread. You can view the threads in thecritical path as well as the CPU time spentin each thread.

Impact of Multicore CPUs on Embedded SystemsHardware and software developers of em-bedded systems will be impacted by themove to multicore CPUs. Hopefully, boarddesigners will find multicore CPUs alle-viate the thermal issues of today’s high-performance processors, while providingcomparable performance. Programmersmay need to adapt to new programmingmodels that include threaded software. Al-though creating, checking, and tuningthreads may initially be arduous, it willprovide you with more control over theresources of the CPU and possibly de-crease program latencies. Those develop-ing real-time systems can partition workamongst multiple cores and assign prior-ities in order to get critical tasks completedfaster.

Software developers who fail to preparefor the transition to multicore CPUs mayeither get pigeon-holed onto older CPUsor risk performance issues from unopti-mized code.

Many tools are available to help thetransition to threaded code. Through mul-tithreaded capabilities such as Hyper-threading Technology, many developersalready have experience with specializedtools and threaded programming models.This background and code developmentwill provide an immediate payback whenthese applications run on dual-core CPUs.

Multicore has also raised the questionof software licensing and the associatedcosts that customers will have to pay.Some software vendors have consideredcharging license fees on a per-core basis,charging more for dual- or multicore sys-tems. Against this tide, Microsoft has an-nounced that its software will be licensedon a per-processor package basis. Thismeans only a single license is needed, re-gardless of how many cores are containedwithin the processor.

The groundwork is being laid for thetransition to multicore CPUs in 2005. Toolsare available to help you develop efficientand reliable threaded code. Embeddedapplication providers should plan theirmove to threaded programming modelsto fully utilize the performance of nextgeneration multicore CPUs.

DDJ


Figure 3: Fork-Join model.

MasterThread

SpawnedThreads

Figure 2: Typical vision-inspection system.

Image ProcessingDefect DetectionSystem Controller

Human Interface

Image Capture

Conveyor Belt

StorageArea Network

Most nontrivial programs are eventbased. The events can be UIevents, incoming network pack-ets, or hardware interrupts. These

events are usually translated to softwareevents in the form of Windows messages,callback functions, delegates, and the like.The Observer design pattern suggests oneway of doing event-driven programming.The Model-View-Controller (MVC) is aGUI-specific architecture for updating mul-tiple views of a data model and handlingUI events. Many large-scale programs areheavily event based and employ somekind of event propagation system. In thisarticle, I examine one of my favorite eventpropagation designs—Interface-basedpublish-subscribe event propagation. Notethat native .NET delegates and events area different implementation of a publish-subscribe with different semantics.

The idea behind interface-based publish-subscribe event propagation is this:

• There is a globally accessible Switch-board object that every event goesthrough.

• Event producers send events to theSwitchboard (publishing).

• Event consumers subscribe for specificevent interfaces.

• Whenever the Switchboard receives anevent, it propagates it to all the sub-scribers.

• Event consumers may unsubscribe atany time.

The upside of this approach include:

• Complete decoupling of event produc-ers and event consumers.

• Allows passing any type of data withthe event.

• Explicit event names.• Type safety; no casting whatsoever.• Missing/misspelled/wrong signature

events/event handlers are detected atbuild time.

• Allows grouping of related events.• Allows subscription/unsubscription of

an entire interface in one call.• Easy debugging even when events are

chained (meaningful call stack).

On the other hand, the downside is:

• The Switchboard object is a maintenancenightmare.

• Event consumers must implement a han-dler for every event on an interface theysubscribe to.

• Sometimes, it isn’t easy to decide howto group events to interfaces.

• It isn’t easy to queue events, persist events,or otherwise handle events as objects.

The EventDemo ProgramListings One through Four present an appthat demonstrates the publish-subscribemechanism. (The complete listings and re-lated files are available electronically; see“Resource Center,” page 5.) Listing Onecontains the definition of the event inter-faces. There are two interfaces: ISomeIn-terface, which has methods with differentsignatures; and IAnotherInterface, whichhas methods with no arguments (just toshow that the code works with multiple

interfaces). The [EventInterface] customattribute that decorates each interface isthe key for detecting the event interfacesduring code generation. I’ll get back to itshortly. The return value of all the eventmethods is void. This isn’t mandatory andEventSwitchboard can handle arbitrary re-turn values just like it handles arbitrarylists of arguments. The reason is that the

essence of event propagation is thatsenders have no idea who (if anyone) islistening and handling its events; hence,they can’t take any meaningful action byintercepting a return value. There is alsothe issue of getting multiple return valuesfrom multiple handlers. If the sender needssome information from a receiver of anevent, then these two objects actually ex-change messages according to some pro-tocol and it is not event propagation.

Listing Two is the Switchboard class,which is derived from all the event inter-faces (C#/CLR allows single implementationinheritance but multiple interface inheri-tance). It has a pair of Subscribe/Unsub-scribe methods for each interface, an Ar-rayList field per interface that holds all thesubscribers to this interface, and imple-mentation of each interface method that sim-ply forwards the event to every subscriber.Granted, this is less than exciting, especial-ly when you consider real-life production


Template-based versusCodeDOM

GIGI SAYFAN

Gigi is a software developer specializingin object-oriented and component-oriented programming using C++. He canbe contacted at [email protected].

Battle of the CodeGenerators

P R O G R A M M E R ’ S T O O L C H E S T

“Reflection pluscode generation area dynamic duo”

systems that might have tens/hundreds ofevent interfaces with hundreds/thousandsof events. This class is bad— real bad. Theonly good thing about it is that it is a Sin-gleton, so you don’t have to deal with morethan one (unless you want to, and some-times you do).

Listing Three contains three Event-Handlers that implement various event in-terfaces. All they do is write to the con-sole some text that identifies them and theevent they received. EventHandler_1 andEventHandler_2 implement one interfaceeach and EventHandler_3 implements twointerfaces. Trivial so far.

The MainClass in Listing Four gets theshow on the road. It gets a reference tothe Switchboard. It creates instances of allthe handlers and starts sending events tothe Switchboard, which diligently forwardseach event to its subscribers— whichpromptly blurb to the console (see Figure1). Finally, Main unsubscribes all the eventhandlers. This is not necessary in this ex-ample, but in real systems where sub-scribers are created and destroyed andevents fly back and forth, you’d better un-subscribe every object that needs to be de-stroyed; otherwise, it continues to live for-ever due to the reference the Switchboardholds. In nonmanaged C++ systems, it iseasy to destroy an object without unsub-scribing, and the next event that Switch-board tries to deliver crashes the system.

The problem is that the Switchboardclass is a world-class abomination. Itmust be modified every time someone,somewhere adds a new event or modi-fies an existing event. It is completelyinadequate if third-party developers needto create new events and use your

Switchboard. If that’s not enough,Switchboard’s code is boring and error-prone due to copy-and-paste reuse tac-tics. It is easy for the Switchboard to re-ceive event A and propagate event B.This kind of error can be difficult to trackin a dynamic system, especially if it’smultithreaded.

The solution is tied up with the capa-bility of the Switchboard class to be gen-erated automatically without human in-tervention. Here is the general idea: Everyevent interface will be annotated with acustom [EventInterface] attribute. A spe-cial switchboard generator program tra-verses all the candidate assemblies thatmay contain interfaces. The generatorreads the metadata information of everytype in the candidate assemblies, detectsthe event interfaces, and for each eventinterface, it magically conjures the ap-propriate code in the Switchboard class(it will create). The SwitchboardGenera-tor can be run at development time againsta known set of assemblies as a prebuildstep or it can be run at runtime to gener-ate the Switchboard class on-the-fly to ad-dress third-party assemblies that send/lis-ten to events. In the latter case, a moresophisticated approach is necessary be-cause the third-party assemblies must beprogrammed against some existing objectthat implements the event interfaces theyuse. This requires a dummy object that isreplaced with the Switchboard after it isgenerated.

What Is Code Generation?Code generation has different meaningsto different people (or different programs).For purposes here, I define it as gener-ating source code in some target pro-gramming language from some simplerinput (domain-specific languages, for in-stance) via a code-generator program plussome optional templates plus some op-tional configuration files. This is the mostinteresting type of code generation forprogrammers. The generated code shouldnot be modified manually by humans (itis okay to provide it as input to yet an-other code generator or some post-processing program) and it should not bechecked into the source-control reposi-tory; it should be treated as an interme-diate product of the build process. Final-ly, you probably want to keep thegenerated code around for debuggingpurposes.

Creating and using code generators isfun, especially compared to performingthe same boring task manually again andagain. It definitely raises the level of ab-straction with which you understand theproblem you try to solve. It also lets yougenerate efficient code targeted for a spe-cific situation.

Code Generation versusData-Driven ProgrammingWith data-driven programming, the flowof the program is controlled by externaldata and is not hard coded. Code gener-ation and data-driven programming aresimilar but have some important differ-ences.

Data-driven programming is:

• More flexible (decisions can be madeaccording to real-time data).

• Incurs performance overhead.• Much more difficult to debug.

Code generation is:

• Less flexible in general (all decisionsmust be made at code-generation time).

• Excellent performance (just like hand-coded in general).

• Potentially excellent debugging (de-pends on your attention to detail whengenerating code).

The boundaries are not clear. Codegeneration can generate data-driven code.Data-driven code can decide to compilesome code before execution, and so on.For example, the CLR’s JIT compiler gen-erates code at load time (on a function-by-function basis) and can perform allkinds of interesting optimizations accord-ing to runtime conditions.

Every program that parses command-line arguments or reads an external con-figuration file or registry entry is probablydata-driven to a degree.

CLR Metadata, Custom Attributes, and ReflectionThe .NET platform elevates the program-ming model, tries to solve pervasive prob-lems, and provides countless services, in-cluding: modern verifiable type system;versioning (side-by-side installation); de-ployment (XCOPY installs); code accesssecurity; streamlined interlanguage inter-operability (including inheritance); stream-lined intraprocess, interprocess, and cross-machine communication (AppDomainsand remoting); and a huge and consistentclass library. Many of these facilities— andespecially the almost transparent way bywhich they are exposed to you— arebased on the rich metadata embedded inevery assembly and is available to the (JIT)compiler and your code at runtime. Themetadata is stored in a binary format inthe CLR Portable Executable (PE) file whenyou compile code. The key observationhere is that assemblies are self-describing.When an assembly is loaded into memo-ry, the CLR loader (and JIT compiler) hasall the information it needs right there inthe assembly PE file. The assembly’s meta-data is also loaded into memory and is


Figure 1: Forwarding events tosubscribers.

Figure 2: Storing attributes.

available for runtime inspection by theCLR and your code. This is a huge jumpfrom the COM world where you had tomess around with registry entries, IDL files,and type libraries.

The CLR lets you annotate elements ofcode with attributes— declarative tags thatcontain additional information and areavailable at runtime through reflection. Thecompiler emits attributes into the assem-bly’s metadata. There are predefined at-tributes and custom attributes you can de-fine. Take another look at Listing One. The[EventInterface] attribute is a custom at-tribute. Figure 2 shows how the attributeis stored as part of the metadata of theISomeInterface interface. The CLR usesmany predefined attributes such as [Seri-alized] and [WebMethod]. Attributes are ac-tually instances of classes that derive fromthe System.Attribute base class. Listing Fivecontains the definition of [EventInterface].Attribute classes may be named arbitrari-ly but, by convention, they always havean Attribute suffix. You can omit the At-tribute suffix when using the attribute(hence, [EventInterface] and not [EventIn-terfaceAttribute] ). When defining an at-tribute, you have to use yet another at-tribute—AttributeUsage, which describeshow the attribute is supposed to be used.You can specify what type of element yourattribute may annotate, whether multipleinstances of your attribute are allowed percode element, and if the attribute may beinherited by subclasses (applies only toclass attributes). The [EventInterface] at-tribute applies only to interfaces (At-tributeTargets.Interface) and doesn’t allowmultiple instances (AllowMultiple = false).

The reflection API lets you drill downthe loaded assemblies and discover thetypes they contain. For each type, you candiscover constructors, methods, interfaces,and events. For each method you can dis-cover its return value, name, and param-eters. In addition, you can access all theattributes, predefined and custom, whichannotate every element. Listing Six demon-strates how to dump an entire assemblyto the console. The API is straightforward:You call GetXXX methods to get collec-tions of objects, such as Type, MethodIn-fo, and ParameterInfo, then you iteratethrough the collection and can access prop-erties such as MethodInfo.Name, Method-Info.ReturnType, and ParameterInfo.De-faultValue. The reflection API also allowsinvoking methods on types and instancesusing information discovered at runtimeusing the Type.InvokeMember method.

The SwitchboardGenerator MainClassSwitchboardGenerator is a program (avail-able electronically; see “Resource Center,”page 5) that generates a Switchboard classimplementation that supports arbitrary

event interfaces. It accepts various argu-ments on the command line, so much ofits operation can be configured withoutactually modifying the code. How does itwork? The input consists of the full pathof the generated Switchboard class, thenamespace of the generated Switchboardclass, the custom attribute type name thatidentifies event interfaces, and a list ofpaths of assemblies that need to bescanned for event interfaces. Switch-boardGenerator iterates over the types inevery assembly; for each interface it finds,it checks to see that it is annotated withthe custom attribute (passed on the com-mand line) and adds the matching inter-faces to a list. After compiling the inter-face list, it launches the selected codegenerator dynamically and invokes itsGenerate method. The naming conven-tion I use is that all code generators arecalled “SwitchboardGenerator” (and sup-port a static 'Generate' method, but theybelong to a different namespace). Thenamespace is passed on the commandand the full name of the selected codegenerator is {namespace}.Switchboard-Generator (see Listing Six). The code gen-erators use the interface list to create thecorresponding Switchboard class. Switch-boardGenerator expects five arguments onthe command line in this order: qualifiedfilename of the generated Switchboardclass, the namespace that contains it, thecustom attribute (including namespace)that identifies event interfaces, which codegenerator to use (CodeDOM or TextTem-plate), and the assembly that contains theevent interfaces. Figure 3 illustrates a shortsession with the SwitchboardGenerator.

Template-Based Code GeneratorThe core of the template-based code gen-erator is simple and doesn’t rely on anyCLR- or Windows-specific feature. I haveused similar generators on UNIX-basedsystems and embedded systems (such asPlayStation). Usually, I use Python to im-

plement it since that language’s string ma-nipulation capabilities are excellent. How-ever, since I wanted to contrast it againstthe CodeDOM generator, I implemented itin C#. The goal is to simply generate somecode. Most of it is just boilerplate code butrequires embedding some dynamic pa-rameters. The solution is to have a texttemplate that contains the boilerplate code,have special place holders for the dynamicparameters (à la sprintf masks), and re-place them at generation time. ListingsEight, Nine, and Ten (available electroni-cally) contain the templates I used to gen-erate the Switchboard class in Listing Two.The $$$-something-$$$ is my notation forplace holders that should be replacedwhen generating the code. This is prettymuch a glorified sprintf, but generating thetext for the place holders is not trivial. List-ing Eight contains the Switchboard classskeleton. The $$$- subscribe_unsub-scribe_methods-$$$ and $$$-event_meth-ods-$$$ are created using multiple in-stances of the templates in Listing Nineand Listing Ten, correspondingly. All theinformation necessary to replace the placeholders with real data is available throughthe reflection API from the interface listprovided to the generator. Note that if Iwanted to decouple the code generationcompletely from the CLR, I could extractthis information and provide it to a gener-ic generator through a dictionary ofkey=value pairs, but preparing it would bea lot of unnecessary work, when the re-flection API provides a nice object mod-el. As you recall, the MainClass (ListingSix) iterated over all the interfaces and ifthey had the custom [EventInterface] at-tribute, it added them to the list. The gen-erator (Listing Eleven, available electroni-cally) walks through this list and for eachinterface, it updates the base interfaces(the Switchboard must derive from everyevent interface to forward them) and cre-ates subscribe and unsubscribe methods;for each event of the interface, it creates


Figure 3: Using the SwitchboardGenerator program.

Listing Oneusing System;namespace EventDemo{

[EventInterface]public interface ISomeInterface{

void OnThis(int x);void OnThat(string s);void OnTheOther(bool b, string s);

};[EventInterface]public interface IAnotherInterface{

void OnBim();void OnBam();void OnBom();

};}

Listing Two// Auto-generated Switchboard class

using System.Collections;using System.Diagnostics;namespace EventDemo{

class Switchboard :EventDemo.ISomeInterface,EventDemo.IAnotherInterface

{public static Switchboard Instance{

get { return m_instance; }}public bool Subscribe(EventDemo.ISomeInterface sink){

if (m_someInterfaceList.Contains(sink)){

Debug.Assert(false);return false;

}m_someInterfaceList.Add(sink);return true;

}public bool Unsubscribe(EventDemo.ISomeInterface sink)

a corresponding event handler and final-ly creates a subscriber list for this inter-face. The most interesting part is creatingthe event methods because the signatureof each event is arbitrary. The CreateEventmethod has to gather the following infor-mation: return type, event name, parame-ter types (list of all parameter types), pa-rameter names (list of all parameter names),the interface name itself, and the name ofthe subscriber list. The CreateEvent methodtraverses the MethodInfo object to extractmost of this information (requires accessto properties and iterating over the Pa-rameterInfo collection) and uses theGetListName helper method.

CodeDOM Code GeneratorThe System.CodeDOM namespace containsmyriad classes that lets you generate a codeabstract object model that represents somecode, and generate source files from it.There is also a System.CodeDOM.Compil-er namespace that lets you compile codeat runtime, but I don’t deal with this name-space in this article. Here are the steps togenerate code using the CodeDOM:

1. Create a root CodeNamespace.2. Add types to the namespace.3. Add members to the types (methods,

properties, events, and fields). 4. Add code statements to the methods.5. Request a specific provider and a code

generator from that provider (CSharp-CodeProvider, for instance).

6. Generate the code and write it to a file.

The promise of the CodeDOM is enor-mous. You may generate code from thesame CodeDOM graph to multiple lan-guages. As long as a language providerexists, you may generate code for this lan-guage. Internally, the web services in-frastructure uses CodeDOM to generateclient-side proxies for you. However, thereality is less then perfect. The CodeDOMdoesn’t cover every CLR feature (unaryoperators, for example), it doesn’t giveyou complete control on the format of thegenerated code, and some (many?) language

constructs are missing. For example, thereis no way to generate C# foreach or whilestatements using the CodeDOM. The onlyiteration statement is 'for'. The worst draw-back of the CodeDOM is its extreme ver-bosity. This may be okay for automaticallygenerated code you never see and debug(like web services proxies), but if youwant to integrate code generation intoyour software-development arsenal, it justdoesn’t cut it. You should be able to de-bug easily through the generated codeand you should be able to easily modifythe code generator itself.

The CodeDOM needs a serious faceliftto become useful for custom code gener-ation. Listing Twelve (available electroni-cally) contains the CodeDOM switchboardgenerator. The basic structure is similar tothe template-based code generator— itgets the information it needs through thereflection API from the interface list, it cre-ates a skeleton Switchboard class, sub-scribe/unsubscribe methods, and eventmethods. However, the code generationitself is done by creating CodeDOM class-es and painstakingly composing them to-gether. The end result is not on par withthe complete control and ease of modifi-cation that the template-based code gen-erator affords. For example, I like to putthe private fields at the bottom of the classdefinition, but the CodeDOM code gen-erator insists on putting them at the top;and the whitespace is not exactly the wayI like but there is nothing I can do aboutit. See Listing Thirteen (available elec-tronically) for the results of the CodeDOMcode generator. Note, in particular, howugly the 'for' construct with the explicitGetEnumerator( ) and MoveNext( ) looks.Another problem that undermines themultilanguage capability of the CodeDOMis the snippet classes (CodeSnippetState-ment, CodeSnippetExpression, CodeSnip-petTypeMember). These classes representliteral text that is pasted in the middle ofthe generated code as is. Of course, youcan’t write a cross-language snippet be-cause every language has a different syn-tax. This means that using snippets de-

stroy cross- language code generation.Since some language constructs are notimplemented yet, you must sometimes usesnippets. Moreover, because creating aproper CodeDOM tree is so complicated,sometimes the easy way is just to use snip-pets (I shamelessly admit I did it).

Using a Code Generator The code generators I present here reflecta successfully built assembly. If I wouldhave modified an event interface and triedto regenerate the Switchboard, the processwould fail. The reason is that the previ-ously generated Switchboard is part of thesame assembly (EventDemo) and it doesn’twork with the modified event interface. Toregenerate it, I have to fix it manually first.What? That’s right, it doesn’t make muchsense. The solution is to separate the eventinterfaces (or in general, all the input tothe code generator) from the Switchboardclass (or in general, all the output of thecode generator). This can be done byputting either the event interfaces or theSwitchboard class in a different assembly.I recommend putting the interfaces in aseparate assembly and putting the Switch-board class in the same assembly as therest of the application (EventDemo). Now,you don’t have the chicken-and-egg prob-lem. The assembly that contains the Switch-board class should reference and depend(build-wise) on the assembly that containsthe interfaces. The SwitchboardGeneratorprogram itself should be invoked as a buildstep of the main assembly that contains theSwitchboard class. This way, an up-to-dateSwitchboard class is generated before start-ing to build its assembly.

ConclusionReflection plus code generation are a dy-namic duo. They can be used for manypurposes and not just to write variousstubs and proxies. Use them wisely andforget about CodeDOM until Longhornships (at least). Maybe by then, some ge-nius will find a way to make it simpler.

DDJ


{if (!m_someInterfaceList.Contains(sink)){


}m_someInterfaceList.Remove(sink);return true;

}public bool Subscribe(EventDemo.IAnotherInterface sink){

if (m_anotherInterfaceList.Contains(sink)){


}m_anotherInterfaceList.Add(sink);return true;

}public bool Unsubscribe(EventDemo.IAnotherInterface sink){

if (!m_anotherInterfaceList.Contains(sink)){


}m_anotherInterfaceList.Remove(sink);return true;

}public void OnThis(System.Int32 x){foreach (EventDemo.ISomeInterface subscriber in m_someInterfaceList)

subscriber.OnThis(x);}public void OnThat(System.String s){foreach (EventDemo.ISomeInterface subscriber in m_someInterfaceList)

subscriber.OnThat(s);}public void OnTheOther(System.Boolean b,System.String s){foreach (EventDemo.ISomeInterface subscriber in m_someInterfaceList)

subscriber.OnTheOther(b,s);}public void OnBim(){foreach (EventDemo.IAnotherInterface subscriber in

m_anotherInterfaceList) subscriber.OnBim();}public void OnBam(){foreach (EventDemo.IAnotherInterface subscriber in

m_anotherInterfaceList) subscriber.OnBam();}public void OnBom(){

foreach (EventDemo.IAnotherInterface subscriber in m_anotherInterfaceList) subscriber.OnBom();

}ArrayList m_someInterfaceList = new ArrayList();ArrayList m_anotherInterfaceList = new ArrayList();static Switchboard m_instance = new Switchboard();

}}

Listing Threenamespace EventDemousing System;

{class EventHandler_1 : ISomeInterface{

public void OnThis(int x){

Console.WriteLine("{0} received OnThis({1})", GetType().Name, x);}public void OnThat(string s){

Console.WriteLine("{0} received OnThat({1})", GetType().Name, s);}public void OnTheOther(bool b, string s){

Console.WriteLine("{0} received OnTheOther({1}, {2})", GetType().Name, b, s);

}}class EventHandler_2 : IAnotherInterface{

public void OnBim(){

Console.WriteLine("{0} received OnBim()", GetType().Name);}public void OnBam(){

Console.WriteLine("{0} received OnBam()", GetType().Name);}public void OnBom(){

Console.WriteLine("{0} received OnBom()", GetType().Name);}

}class EventHandler_3 : ISomeInterface, IAnotherInterface{

public void OnThis(int x){

Console.WriteLine("{0} received OnThis({1})", GetType().Name, x);}public void OnThat(string s){

Console.WriteLine("{0} received OnThat({1})", GetType().Name, s);}public void OnTheOther(bool b, string s){

Console.WriteLine("{0} received OnTheOther({1}, {2})", GetType().Name, b, s);

}public void OnBim(){

Console.WriteLine("{0} received OnBim()", GetType().Name);}public void OnBam(){

Console.WriteLine("{0} received OnBam()", GetType().Name);}public void OnBom(){

Console.WriteLine("{0} received OnBom()", GetType().Name);}

}}

Listing Fourusing System;namespace EventDemo{

class MainClass{

/// <summary>/// The main entry point for the application./// </summary>[STAThread]static void Main(string[] args){

Switchboard s = Switchboard.Instance;EventHandler_1 eh_1 = new EventHandler_1();EventHandler_2 eh_2 = new EventHandler_2();EventHandler_3 eh_3 = new EventHandler_3();s.Subscribe(eh_1 as ISomeInterface);s.Subscribe(eh_2 as IAnotherInterface);s.Subscribe(eh_3 as ISomeInterface);s.Subscribe(eh_3 as IAnotherInterface);s.OnThis(5);s.OnThat("Yeahh, it works!!!");s.OnTheOther(true, "Yeahh, it works!!!");s.OnBim();s.OnBam();s.OnBom();s.Unsubscribe(eh_1 as ISomeInterface);s.Unsubscribe(eh_2 as IAnotherInterface);s.Unsubscribe(eh_3 as ISomeInterface);s.Unsubscribe(eh_3 as IAnotherInterface);

}}

}

Listing Fiveusing System;namespace EventDemo{

[AttributeUsage( AttributeTargets.Interface, AllowMultiple = false)]public class EventInterfaceAttribute : System.Attribute{}

}

Listing Sixusing System;using System.Reflection;namespace ReflectionDemo{

class AssemblyDumper{

static public void Dump(Assembly a){

Type[] types = a.GetTypes();foreach (Type type in types)

DumpType(type);}static private void DumpType(Type type){

Console.WriteLine(" ---- {0} ----", type.FullName); MethodInfo[] methods = type.GetMethods();foreach (MethodInfo method in methods){

Console.Write("{0} {1}(", method.ReturnType, method.Name);ParameterInfo[] parameters = method.GetParameters();foreach (ParameterInfo parameter in parameters){

string typeName = parameter.ParameterType.Name;string parameterName = parameter.Name;Console.Write("{0} {1}",

parameter.ParameterType.Name, parameter.Name);if (parameter.IsOptional)

Console.Write(" = {0}", parameter.DefaultValue.ToString());

if (parameter.Position < parameters.Length-1)Console.Write(", ");

}Console.WriteLine(")");

}}

}}

DDJ


Making the appropriate selection ofan ASP to ASP.NET migration strat-egy is not always clear cut. As adeveloper, you would probably be

tempted to rewrite the entire web appli-cation in Microsoft .NET from scratch.However, your manager might not be asenthusiastic as you are to this idea. Theapplication is already in production andsatisfies the requirements. Why follow amore risky and costly path if it is possi-ble to start extending the application in.NET and at the same time preserving aninvestment in the legacy ASP code? Espe-cially if a complete migration can beachieved gradually in a number of evolu-tionary migration steps, with every stepresolving a concrete migration-justifiableproblem.

Side-By-Side ApproachMigration is not necessarily an “all or noth-ing” proposition. ASP and ASP.NET can

run side-by-side (coexist) at the same time.A web site or web application within asite can contain both ASP and ASP.NETpages. Legacy code can be replaced laterin one shot, or gradually as necessary, ornever replaced at all. The side-by-side ap-proach has its own pros and cons; seeTable 1.

It is important to emphasize that evenwith a minimalist “do not touch legacycode” strategy, some changes to this codemust be done. They are:

• Unpredicted small future updates thatare tightly coupled with legacy code andare preferred, compared to the code mi-gration option. In cases other than mi-nor updates, one of the migration strate-gies is preferred.

• Changes driven by the decision to ex-pose legacy code to .NET code.

• Changes driven by the decision to ac-cess .NET code from legacy code.

• Changes that may be necessary to keepboth the ASP and ASP.NET parts of anapplication alive in case of a possibletimeout.

• A legacy code update for ASP andASP.NET state synchronization.

Legacy code can be migrated gradu-ally, while applying migration strategiesand running legacy code for a long pe-riod of time until a complete migrationis achieved. In this case, all problemsand benefits of the side-by-side approachcontinue to exist for the duration of thisprocess.

The idea behind the side-by-side ap-proach relies on the assumption that lega-cy code and .NET code implement differ-

ent parts of the application with low cou-pling between them. There are three pos-sible solutions:

• If common functionality is already in-corporated as a COM object in legacycode, and has a proper interface thatfits or can be easily adapted to the new

architecture of the .NET application, thenit can be exposed to .NET code by theRuntime Callable Wrapper (RCW).

• If common functionality is already in-corporated as a mix of scripts and/orCOM objects and a significant amountof work is necessary to expose it to the.NET application, then this functionali-ty can be removed from legacy code,implemented in the .NET applicationcode, and exposed to legacy code as aCOM Callable Wrapper (CCW).

• If removing common functionality fromlegacy code and replacing it with CCW


ASP to ASP.NETMigration Strategy

Understandingpossible migrationpaths leads to optimalmigration strategies

MARK SOROKIN

Mark is a senior enterprise architect for aconsulting company. He can be contact-ed at [email protected].

W I N D O W S / . N E T D E V E L O P E R

“With server-basedsynchronization, the state can beshared by passingdata through acommon location atthe server side”

invocations leads to serious legacy codeshuffling, then common functionality du-plication can be an option. This deci-sion has to be made while clearly un-derstanding the consequences ofmaintaining duplicated code only for acertain period of time, until the com-plete migration process is finished.

Figure 1 illustrates legacy applicationfunctionality exposed to the .NET part ofa side-by-side application. Note that thisarticle discusses logical layers and notphysical tiers of application deployment.Figure 2 illustrates .NET application func-tionality exposed to the legacy part of aside-by-side application.

The side-by-side approach has its ownproblem that does not exist for any oneASP or ASP.NET application separately:state synchronization between ASP andASP.NET environments. ASP and ASP.NEThave different state management. Thismeans that a custom application and ses-sion state management synchronizationhas to be developed and minimal legacycode changes have to be made.

There are two major ways to synchro-nize state in the side-by-side scenario:

• Client-based synchronization.• Server-based synchronization.

Client-Based SynchronizationWith client-based synchronization, the ap-plication and session states (“state”) canbe shared by passing state data throughthe browser. This way, the state is passedfrom page to page with request/responsedata. The possible methods are:

• Passing state using cookies.• Passing state in URL strings (URL

munging).• Passing state using hidden form fields.

If a legacy application uses one of thesemethods to pass state between pages al-ready, then no adjustments in legacy codeare needed. Otherwise, if intrinsic Appli-

cation or Session objects are used, then theobject’s state can be passed back and forthbetween ASP and ASP.NET pages usingone of the three methods just mentioned.


Figure 1: Common business functionality is encapsulated in a COM object andaccessed from .NET managed code by the RCW.

COM Object(with common functionality)

VBScript/JavaScript

Business Layer

Presentation Layer

.NET Class

UI Process Component

Business Layer

Presentation Layer

Legacy Code(Unmanaged Code)

.NET Code(Managed Code)

RCW

Figure 2: Common business functionality has been moved into managed codeand exposed to unmanaged code by the CCW.

COM Object

VBScript/JavaScript

Business Layer

Presentation Layer

.NET Class(with common functionality)


Business Layer

Presentation Layer



CCW

Pros

1. The investment in legacy code is preserved.2. Relatively small risk associated with the migration process.3. Relatively low cost of migration.4. Personnel are in place, trained, and the application delivers a service today.5. New functionality can be added separately, minimally affecting existing code.6. New functionality added in .NET, gains .NET benefits.

Cons

1. Legacy code and .NET code have to coexist. This adds additional cost in maintaining two different environments. Both environments have to be deployed and supported.

2. Application and session states of ASP and ASP.NET live in different processes and have to be synchronized. This involves additional cost in development, testing, and support.

3. Legacy code does not gain productivity, maintainability, extensibility, performance, and other .NET benefits.4. Performance of interoperability between legacy code and .NET can possibly be an issue.5. Application architecture as a whole is not consistent. Legacy and .NET application parts have different architectures.6. The common functionality for legacy and new .NET code has to be maintained. Possible solutions are discussed below. 7. Some legacy code changes are necessary to communicate with the .NET part of the application.8. The problem of adding new functionality that affects legacy code can achieve significant magnitude, possibly resulting in migration or

rewrite anyway.9. The timeout problem should be resolved. ASP and ASP.NET application parts have separate timeout settings.

Table 1: Pros and cons of a side-by-side approach.


Figure 3: Client-based state synchronization.

From ASP.NETto ASP

From ASPto ASP.NET

Legend:State

SynchronizationDirection:

VBScript/JavaScript

Presentation Layer



ASPApplication andSession State


Presentation Layer ASP.NETApplication andSession State

Client's BrowserState

Figure 4: ASP to ASP.NET client-based synchronization implementation.

Presentation Layer



Client's BrowserState

ClientBasedSync.asp

ASPApplication andSession State

ASP Page

Presentation LayerClientBasedSync.aspx

ASP.NETApplication andSession State

ASP.NET Page

Server.Transfer

Get state

Set StateServer.Transfer

Pros

1. Easy to implement. Development time is negligible if only strings and numbers have to be passed.

2. Good for a small state.3. Good for nonsensitive data.4. Especially good for static synchronization (see “Static versus Dynamic

Synchronization”).

Cons

1. Security problem. State is passed through the wire and state encryption for sensitive data is now necessary.

2. Security problem. Even with encryption, passing such data as database connection strings imposes a security risk. Internet web sites are especially vulnerable.

3. Passing a relatively large state affects bandwidth and response time. Especially important for 56K modem users and mobile users.

4. Custom code is needed to send and receive a state.5. Works for a Web Farm if affinity for load balancing is in place.

Table 2: Pros and cons of client-based synchronization.

The first two options have more re-strictions, such as the size of the state anddisabled cookies, but do not differ con-ceptually from the last one. Figure 3 il-lustrates client-based state synchroniza-tion. Table 2 lists the pros and cons ofclient-based synchronization.

The client-based synchronization ap-proach can be implemented by develop-ing two intermediate pages that will sitbetween ASP and ASP.NET pages involvedin client-based synchronization. Thesepages, named “ClientBasedSync.asp” and“ClientBasedSync.aspx,” are in Figure 4.Here, the ASP page uses Server.Transferto the first intermediate ASP page Client-BasedSync.asp, instead of redirecting di-rectly to the destination ASP.NET page.The ClientBasedSync.asp generates a tem-porary form on the fly, placing state vari-able names/value pairs into its hiddenfields, and redirects it to the second in-termediary ASP.NET page ClientBased-Sync.aspx. This page assigns state vari-ables of the ASP.NET part of theapplication and uses Server.Transfer to thedestination ASP.NET page.

Notice, however, that only one trip tothe client’s browser is needed. A similarapproach works in another synchroniza-tion direction— from an ASP.NET to anASP client-based synchronization imple-

mentation. If state contains more complextypes, not simple strings or numbers, thenthey have to be serialized to be under-standable by both environments.

Server-Based SynchronizationWith server-based synchronization, thestate can be shared by passing datathrough a common location at the serverside. This location can be a:

• Dedicated server process memory space.• Database.

ASP.NET supports both options, but clas-sic ASP does not support them. ASP.NETsaves the state in a proprietary binary for-

mat that can be changed in the future. TheASP.NET state cannot be directly instanti-ated from classic ASP. This makes it diffi-cult to use an ASP.NET state from an ASPpart of the application. The better solutionwill be to rely on some custom commonstate format, which both ASP and ASP.NETcan access and understand. In this case, acustom server-side synchronization has tobe developed in-house or third-party soft-ware must be acquired.

The replacement of built- in ASP andASP.NET state management with a customone should follow the dictionary patternby manipulating key/value pairs:

CustomStateObject(theKey) = theValue


Pros

1. Consistent solution for all web sites taking a side-by-side path.2. Full load balancing can be achieved. Affinity restriction is not applied any more.3. Can be easily switched to native ASP.NET state management by the end of migration.4. Better security; data remains on the server side.5. Does not consume bandwidth; data remains on the server side.

Cons

1. Development cost and time. Custom state management has to be implemented forboth ASP and ASP.NET (if third-party software is not acquired).

2. Native state management has to be replaced with a custom one. Minor code changes are necessary.

3. Will become obsolete after all the web sites are migrated.

Table 3: Pros and cons of server-based synchronization.

This lets you easily replace the nativeASP and ASP.NET state implementation bypreserving the nature in which both ofthem access state. Later, after completemigration, this custom solution is easilyswitched to the native ASP.NET state im-plementation. Figure 5 shows two possi-ble ways of accessing a common statefrom legacy code: through the callablewrapper and directly. By accessing statethrough the CCW, the common code isnot duplicated, but an interoperabilityoverhead exists. Table 3 lists the pros andcons of server-based synchronization.

Static versus Dynamic SynchronizationThere are two ways to synchronize statethat can be achieved by using any one ofclient-based or server-based synchro-nization:

• Static synchronization.• Dynamic synchronization.

Static synchronization can be utilized ifApplication and/or Session states are up-dated by only a small percentage of website pages. This presumes the static natureof the web site state. The static state na-ture opens a way of saving on state syn-chronization. Instead of synchronizingstate on every new page request, the statecan be synchronized while leaving onlythe pages that update the state.

With static synchronization, both ASPand ASP.NET states have to be constantlyin sync and it is the developer’s responsi-bility to maintain them by not forgettingto update the state. Update the ASP.NETstate while leaving an ASP page that up-dated the ASP state and arriving at theASP.NET page. For the opposite direction,update an ASP state, while leaving anASP.NET page that updated the ASP.NETstate and arriving at the ASP page.

State for static synchronization has tobe created in Application_OnStart and Ses-sion_OnStart event handlers for both glob-al.asa and global.asax files. This gives asynchronous start to both ASP andASP.NET components of a side-by-sideapplication.

Static synchronization can be achievedeasily by passing state in hidden fieldswith client-based synchronization. Statecan also be passed, using server-side syn-chronization, through the database to allpages, called by the updating state page.Table 4 lists the pros and cons of staticsynchronization.

Dynamic synchronization has to be usedif a significant amount of pages update thestate. In this case, there are no performancegains because the state has to be syn-chronized constantly. Dynamic synchro-nization should occur automatically onloading and unloading every page, freeing


Figure 5: Server-based state sharing.



CCWCOM Object .NET Class

Common State

Legend: State Access:

Direct access to state from .NET codeand indirect access (through CCW) from legacy code

Direct access to state from legacy code

Pros

1. Better performance. State is updated only when needed, saving especially on expensive state serialization.

2. No need for a custom synchronization object. The native Application and Session objects can be used instead of a custom one.

3. No need to serialize and pass any part of the state if it is used only by legacy code or only by .NET code.

4. No code change is needed to replace native Application and Session objects with custom ones.

Cons

1. Custom code instead of a generic solution.2. Possibility of developer errors, because synchronization is added manually.3. More testing is needed for checking page redirects that have to be synchronized.4. Affinity restriction for load balancing is in place.

Table 4: Pros and cons of static synchronization.

Pros

1. Generic solution instead of custom code.2. No possibility for developer errors, because synchronization is done automatically for

every page load and unload.3. Full load balancing can be configured for a custom state object management

implementation.4. No need to duplicate Application and Session state initialization.

Cons

1. Performance hit for expensive state synchronization.2. The entire state has to be saved/restored for every page.

Table 5: Pros and cons of dynamic synchronization.

Pros

1. Minimum legacy code is migrated, preserving investment.2. Opens possibility to improve performance for performance-critical code.3. Opens possibility to improve design for design-critical code.4. Smaller risk compared to other migration strategies.5. Less time needed for every local migration step compared to other migration strategies.6. A sequence of local migration steps decreases the risk, compared to other, more

revolutionary migration strategies.

Cons

1. .NET migration benefits are applied only locally.2. Total cost and time to migrate the same amount of code in stages is higher, compared

to migration of this code using other, one shot, migration strategies.3. Often difficult or impossible to separate functionality to be migrated because of its

loose coupling with other parts of the application.4. A legacy code change may be needed to access migrated functionality.

Table 6: Pros and cons of local migration.

you from adding synchronization codemanually. By replacing ASP and ASP.NETstates with custom state management, thesame state storage is synchronously usedby both. Table 5 lists the pros and cons ofdynamic synchronization.

Migration StrategiesThe side-by-side approach lets legacycode and new .NET code coexist for a pe-riod of time. Even if the existing legacycode is completely satisfactory, in manycases it is better to relocate some com-mon functionality into a new .NET part ofa web site.

Migration ranges from local to completemigration of the application as a whole.Different strategies can be applied at dif-ferent stages of this migration process,ending with a complete migration or theend of application support, whichevercomes first.

A partial migration interoperability playsan important role in connecting the un-managed and managed worlds. RCW andCCW are two major ways to interoperate.The unmanaged code in native Win32DLLs can also be called through interop-erability, known as “Platform Invocation”(P/Invoke). However, the .NET Frameworkdoes not support calling from Win32 DLLsinto .NET managed code. To call directlyfrom unmanaged to managed code, COMinteroperability must be used.

The main migration strategies are lo-cal migration, horizontal migration, vertical migration, and rewrite and op-timization.

Local MigrationLocal migration is the migration of a co-hesive part of one of the legacy applica-tion layers. Underlying layers’ code, tight-ly coupled with this part, can also bemigrated. In most cases, it will be a mi-gration of some part of a business layer,with a corresponding data layer part. An-other option includes migrating part ofa presentation layer, which is based oncomplex business logic, left in legacycode. Figure 6 illustrates these options.

Some of the reasons for local migrationcan be:

• Common business functionality for lega-cy and .NET parts of an application.

• Performance problems in legacy code.• Part of a legacy application that has to

be improved and/or extended.• Part of a legacy application that will be

used by other applications.

Local migration implies calling betweenlegacy unmanaged code and .NET man-aged code through the interoperability lay-er (if no duplication is selected). The RCWand CCW, depending on the direction of

invocation, translate data between two en-vironments. Some blittable data types, suchas integers, floats, and so on, do not re-quire translation. Nonblittable data types,such as strings, require translation. Non-blittable data-type conversion overheadaffects performance.

In many cases, this performance over-head is negligible compared to the amountof work the concrete component (COMor .NET) is doing. If, however, alightweight component is accessed in achatty fashion, then this overhead can besignificant. Setting and getting differentcomponent properties multiple times couldhave a performance impact, since everycall crosses an interoperability boundary.

Another problem can appear if the ex-isting COM interface does not satisfy a .NETclient for reasons other then performance;for example, when it has to be redesigned.

Both problems can be resolved by cre-ating a Custom Managed Wrapper (CMW)in front of the RCW. The CMW allows ex-posure of an existing COM component toa .NET client by including the necessaryadditional code into this managed adapter.The CMW consumes the RCW internallyand delegates most of the calls to COMthrough it. The legacy code can continueaccessing the COM component directlywithout any change.

CMW enables to take advantage of:

• Inheritance.• Movement of COM object functionality

into the CMW and elimination of chat-

ty access through an interoperabilityboundary to improve performance.

• Parameterized constructors.• Static methods.• Complex data conversions. For exam-

ple, between an ADO.NET dataset andan ADO recordset.

• Gradual movement of more and morefunctionality into the CMW, improvingperformance and approaching a com-plete component migration without af-fecting the .NET client.

• Design better interface.• Movement of a remoting boundary from

DCOM to .NET Remoting, if such aboundary exists. This allows closingDCOM ports in the firewall.

Figure 7 illustrates this approach, whileTable 6 lists the pros and cons of local mi-gration.

Horizontal MigrationHorizontal migration involves migration ofan entire layer of the application. Figure 8illustrates horizontal migration. Horizontalmigration can also be done in pairs: busi-ness and data layers, for example. Hori-zontal migration can imply additional costand time for mapping data between layers.For example, migrating the data layer andkeeping other layers as-is requires conver-sion between an ADO.NET dataset and anADO recordset. Table 7 lists the pros andcons of presentation-layer migration.

To transparently replace business layerCOM components with .NET classes,


Figure 6: Local migration main options.



RCW

Presentation Layer

Business Layer

Presentation Layer

Business Layer

Data Layer

CCW

Migration

MigrationInvocation

Data Layer

Figure 7: Migration and a Custom Managed Wrapper.



RCWCOM Object

CMW

LegacyCode

.NET ClientClass

without affecting the legacy presentationlayer, GUIDs and/or ProgIds of these COMcomponents must be maintained. The in-teroperability assemblies have to be de-ployed. Table 8 lists the pros and cons ofbusiness-layer migration.

Data-layer COM component migrationcan be done with the same approach, andin general, repeats the pros and cons ofthe business-layer COM components mi-gration. There is no need for ADO record-set and ADO.NET dataset conversion iflayers were optimized to consume theADO.NET DataSet and/or DataReader.

Vertical MigrationVertical migration involves the migrationof an application portion vertically throughall application layers. This involves iden-tifying a logical piece of a web applica-tion having minimal interaction with oth-er pieces and migrating it. Both ASP codeand COM components, implementingrather independent pieces of applicationfunctionality, are potentially good candi-dates for vertical migration. The remain-ing functionality of the site runs side-by-side with the vertically converted part andcan be migrated later, based on projectschedules and resources.

The vertically migrated part of the appcommunicates with the legacy code using.NET interoperability. This interoperabilityshould be minimal, reflecting low couplingbetween the vertically migrated part andthe rest of the application. Figure 9 illus-trates a vertical migration, and Table 9 liststhe pros and cons of vertical migration.

Rewrite and OptimizationThe rewrite/optimization can be done af-ter migration, in parallel with it or insteadof it. This lets you gain such .NET benefitsas: layered application architecture (in casethe legacy application did not have layeredarchitecture prior to migration), caching,separation of HTML views from code be-hind, server controls and data binding, serv-er events, view state, the .NET exceptionhandling, improved developer productivi-ty, improved maintainability, and so on.

Migration GuidelinesHere are some common migration guide-lines and suggestions:

• For simple sites with low application-and session-update rates, client-basedstatic synchronization can be used.

• If application and session state updaterate is high, consider using horizontalmigration of ASP pages versus customstate-management utilization.

• Move presentation layer inline code tocode behind for cleaner MVC architec-ture separation. By the way, you willreceive IntelliSense benefits.


Pros

1. State synchronization problem does not exist. Savings on side-by-side state synchronization development, incorporation, and testing.

2. Native state management.3. Full load balancing.4. .NET security mechanisms can be used.5. Ease of integration with other .NET technologies.6. Better performance by using page compilation.7. Better performance by using caching.8. Can utilize .NET configuration and maintenance benefits.9. Can utilize .NET deployment benefits.10. All other .NET managed code benefits.11. Ready for step-by-step optimization.

Cons

1. Power of .NET is not fully used. Server controls, data binding, and so on, are not utilized.

2. Communicates with business and data layers through COM interoperability, causing possible performance degradation.

3. Old ASP design stays in place. This makes it difficult to support and extend the code.4. Increased risk, cost, and time compared to “as is side-by-side” presentation layer

usage.5. ASP.NET pages must use single-threaded apartment (STA) to communicate with

COM objects.6. Interoperability assemblies have to be deployed.7. .NET Framework has to be deployed on all web servers.

Table 7: Pros and cons of presentation-layer migration.

Pros

1. No interoperability overhead to access the legacy part of application business logic from the .NET part of application business logic and vice versa.

2. Less risky, lower cost and time if it is difficult to slice business logic for local or vertical migration.

3. Good opportunity to redesign the application business layer in order to take advantage of .NET.

4. Migrating COM objects to .NET classes improves scalability.

Cons

1. Additional interoperability overhead to access the business logic from the presentation layer of a legacy part of an application (if the presentation layer is not migrated yet).

2. Additional conversion to a recordset or other COM data formats for a nonoptimized presentation layer.

3. Generally higher risk, time and cost compared to the interoperability option.4. .NET Framework has to be deployed on all servers, where business logic assemblies

are deployed.

Table 8: Pros and cons of business-layer migration.

Pros

1. Works very well if the application has isolated functional parts.2. Works well together with a parallel extension of existing functionality using .NET.3. No interoperability overhead between migrated ASP and COM components through

all the layers.4. No conversion between complex data types, like dataset and recordset is needed.5. Usually less risky, compared with complete layer horizontal migration.6. Vertically migrated application part can be redesigned and take advantage of .NET

benefits through all migrated layers.

Cons

1. Interoperability overhead to communicate with the rest of the nonmigrated unmanaged code.

2. Additional effort to refactor independent functionality for vertical migration.3. Migrated shared code has to be accessed through interoperability or duplicated.4. State synchronization is still needed.5. Security for a mixed environment has to be addressed.6. Deployment should be done for the mixed environment.

Table 9: Pros and cons of vertical migration.

• Complex COM object hierarchies in abusiness layer should be migrated as aunit to decrease the number of inter-operability calls.

• Vertical migration minimizes the workinvolved in achieving interoperabilitywith ADO.

• Perform code-path analysis before lo-cal or vertical migration. The ASP pagesand components that are used duringthe user’s single interaction are consid-ered a code path. Distinct code pathsare a natural place to consider isolatinga piece for local or vertical migration.

• Code paths that share components aregood candidates for vertical migration,minimizing code interoperability required.

• Minimize touch point between ASP andASP.NET for vertical migration.

• Consider the horizontal migration of abusiness layer if it is well separated fromother layers and all its COM objects arestrongly coupled.

• Migrate and rewrite read-only func-tionality first. By doing this, you canquickly take advantage of data bindingand caching capabilities of ASP.NET orexpose data as web services.

• Vertical migration provides a good testfor a new .NET web-application design.

• Rearchitecture and rewrite from scratchif the goal is to achieve a high strategicvalue and make the best use of the .NETFramework.

The migration lifecycle contains fourmajor steps that are repeated iteratively ifnecessary:

• Code analysis, deciding what will be inthe scope for the current migration stage,and designing.

• Migration.• Rewrite and optimization. Refactor mi-

grated code and optimize it. Rewritefrom scratch if necessary.

• Testing and deploying.

In horizontal migration, these steps areiterated for every layer. In vertical migra-tion, they are iterated for every functionalmodule. In local migration, they are iterat-ed for local functionality. Local migrationcan be combined with a horizontal or ver-tical migration strategy. Figure 10 illustratesthe dynamics of a migration process.

ConclusionMigration from ASP to ASP.NET can bedone in different ways. Understandingpossible migration strategies, their draw-backs and benefits, helps software ar-chitects and developers in selecting anoptimal migration strategy and its imple-mentation.

DDJ


Figure 9: Vertical migration.

Data Layer

Business Layer



Presentation Layer Presentation Layer

Business Layer

Data Layer

Migration

Figure 10: Migration state diagram.

Start

Local or HorizontalMigration Test and Deploy

Finish

Local or VerticalMigration

Rewrite andOptimize

Analysis and Design

Figure 8: Horizontal migration.



Migration

Migration

Migration

Business Layer Business Layer

Data Layer

Presentation LayerPresentation Layer

Data Layer

Windows Forms technology is builtover Win32 windowing. To ef-fectively use Windows Forms, youhave to understand the principles

of Win32 windowing— including the re-lationship of windows classes, messagequeues, and threads. In this article, I touchupon some Windows Forms topics andexplain why you need a good under-standing of Win32 windowing to write thebest Windows Forms code.

Creating Windows and FormsWhen you create a new Form, you haven’tcreated a window— all you’ve done is cre-ate a .NET object. The object’s construc-tor just creates the .NET object’s side ofthings; that is, fields and event handlersare initialized. The actual Win32 windowis only created the first time that the formis made visible. The Visible property callsthe protected method SetVisibleCore. Thismethod attempts to read the form’s Han-dle, and the property get handler calls theprotected method CreateHandle, whichcreates the window.

This two-stage construction means thatthe constructor should be used to initial-ize items that are not dependent upon thewindow’s handle. In light of this, somecontrols cache data that requires a Win-dows handle to be used when the controlhandle has been created. For example,the ListView control is based on the listview common control. If you want to addan item to the control, you do so throughthe Items property, which is a ListView-ItemCollection. The Add method of thisclass actually delegates the task to theListView.InsertItem method, which tests to

see if the list view control has been creat-ed. If so, it inserts a new LVITEM by send-ing the control the LVM_INSERTITEM mes-sage. If the control has not been created,then the ListViewItem objects are stored inan ArrayList. When the control is created,the HandleCreated event is raised, thehandler for this code iterates through allthe items in the cached ArrayList, addingthose items to the control.

To create your own control, factor yourcode so that the operations that requirethe control to have a Windows handle canonly be called after the handle is created.Or if the operations could be called be-fore, cache the operations and replay themon the HandleCreated event handler.

Indented Items in a ListViewFor example, one feature I want to usewith the ListView class (available elec-tronically; see “Resource Center,” page 5)is the ability to indent the main item. Ifyou use Outlook Express, you know whatI mean. When you click on Local Folders,you see a list view that shows all the fold-ers that you have, and nested folders areindented from their parent folders. To dothis with Win32 is simple. Your list viewmust be in report view (LVS_REPORT,which Windows Forms calls View.Details)and it must have at least one column. Youritems must also have icons because theindentation is in units of the width of anitem’s image. Then all you need to do isprovide a value for the LVITEM.iIndentfield that is passed to the control usingthe LVM_INSERTITEM message.

There is no mechanism to do this withthe Windows Forms ListView class, so youhave to derive from this class and createyour own implementation. Listing One isa partial listing of a class to do this. TheInsertIndented method inserts a List-ViewItem with the specified indentation.InsertItem is the method that does the ac-tual work by sending the LVM_INSERT-ITEM message. Because an item may havesubitems, the SetItemText is a helpermethod to set the text of each subitem. Fi-nally, if the list view has not been creat-ed, the information is put into an LVIDa-ta object and stored in the operationsmember. When the list view is created, theOnHandleCreated method is called, whichiterates through the items in the operations

field and inserts them into the list view.You must first create the code to insert

an item, and so you need to write themanaged version of LVITEM and gain ac-cess to SendMessage using Platform In-voke; see Listing Two. There are just twopoints to make about this:

• LVM_INSERTITEM is passed a pointer toan unmanaged LVITEM structure, butyou do not have to worry about this be-cause Platform Invoke does the workfor you when you declare the final pa-rameter as ref LVITEM. The ref indicatesthat a pointer is passed and LVITEMrefers to the managed class.

• This class is marked with [StructLay-out(LayoutKind.Sequential)], which in-dicates that the fields are placed in mem-ory in the order you specify. These itemshave the same size as the fields in theunmanaged LVITEM structure.

Listing Three accesses the list view con-trol. InsertItem simply initializes an LVITEMobject and calls SendMessage. The ListViewclass uses the lParam member to hold aunique ID to identify the item. To gener-ate this ID, I need to use the same mech-anism as the ListView class uses. It doesthis with a private method called Gener-ateUniqueID. Because this is private, I usereflection to access it from the base class.When the item is added to the control, Ialso add it to the private Hashtable in thebase class called listItemsTable and up-date a field that holds the number of items.I don’t like using reflection like this, butthere is no other way to get access to pri-vate members.


Windows Forms and Win32Understanding Win32windowing is the key

RICHARD GRIMES

Richard is the author of Programmingwith Managed Extensions for MicrosoftVisual C++ .NET 2003 (Microsoft Press,2003). He can be contacted at [email protected].

W I N D O W S / . N E T D E V E L O P E R

“To use WindowsForms effectively,you have tounderstand Win32windowing”

The Items collection of the ListView classis based on a Hashtable of ListViewItemobjects, each containing information aboutthe item in the control including its dis-play ID and the unique ID assigned to theitem. These values are stored in privatefields in ListViewItem and are assigned us-ing the private method Host. The Up-dateItem method uses reflection to callthis method.

Listing Four is the remaining code. In-sertIndented is the public method used toinsert items into the list view control. First,it checks to see if the control has beencreated. If not, then the information isstored in the operations container; other-wise, InsertItem is called. The implemen-tation of OnHandleCreated merely checksto see if there are any cached items, andif so, these are inserted using InsertItem.There is one caveat to this method. If youintersperse calls to InsertIndented withcalls to ListView.Items.Add, then if the op-erations are cached, the final order of theitems in the control will not be preserved.It is prudent to only use InsertIndented.Listing Five shows code that uses this con-trol; the LoadIconFromResources methodis not shown— you should implement itto load an icon from a file or obtain onefrom the application’s resources.

A fair amount of code had to be writ-ten to let you add a simple functionality.Most of this code was required to let youbypass the existing mechanism to insertitems into the control, while still allowingthe existing functions to continue to work.It’s a pity that Microsoft did not add thisfunctionality to the ListView class.

Application ContextsListing Five shows the standard way tocreate a form: Pass a new instance of theform object to the Application.Runmethod. I mentioned earlier that the form’swindow is not created until the Visibleproperty is set to True, so where does thishappen? This is one of the responsibilitiesof the Run method, although this methoddoes a lot more. Every Windows applica-tion needs a “message pump.” In its sim-plest form, this calls the Win32 GetMes-sage function to retrieve the next messagein the thread’s message queue, then callsDispatchMessage to call the window pro-cedure of the window that the message isintended for. Notice that I said the thread’smessage queue. Message queues, andhence windows, have thread affinity. If awindow is created on a specific thread,then its messages will be sent to the mes-sage queue for that thread.

The application’s main thread (or in-deed, any thread with Thread.IsBack-ground set to false) will keep the appli-cation alive. If the main thread dies, so willthe application’s process. Similarly, if the

thread keeps alive then, so will the pro-cess. However, windows have a differentlifetime mechanism. Windows that have acaption bar have an adornment (the “X”button), and if users click this they expectthe window to close. If the application hasjust one window, users expect closing thiswindow to kill the application’s process.Win32 developers handle this by imple-menting the message pump as a loop thatbreaks if GetMessage returns false. Thishappens if GetMessage reads the WM_QUITmessage from the message queue. Conse-quently, you control the application’s life-time by posting this message to the mainthread’s message queue.

Application.Run is called on the mainthread, and it implements the messagepump. This means that the main thread iskept alive, handling the pump and dis-patching messages to the various windowsin the application. When this method stopsreading messages and returns, the mainthread dies and the application’s processdies. However, this does bring up thequestion of how the message pump loopis broken. The answer lies in applicationcontexts.

The Application class lets you register amain form for the thread where Applica-tion.Run is called. If you look at the doc-umentation for this class, you’ll see that allthe members are static, which raises thequestion of where this information is held.Application has a nested class calledThreadContext and instances of this classare held in thread local storage; that is,each thread has a different instance. Whenyou call Application.Run, the currentthread’s ThreadContext object is obtained.

The Application class has the methodExitThread, which a form could use toclose down the message pump. Listing Sixis a simple forms application that createsa form, makes it visible, and calls Appli-cation.Run with no parameters. The callto Run provides the message pump, whichis stopped with a call to ExitThread in theClosed event handler. If you comment outthis line and run this application, you seethat clicking on the X adornment box clos-es the window. However, if you run TaskManager, you see that the process contin-ues to run; without a window, you’ll haveto use Task Manager to close this process.

The version of Run that you usually callis the overload that takes a Form object.However, all this does is wrap the Formobject in an ApplicationContext object andpass it to another overload of the Runmethod, which starts up the messagepump. The ApplicationContext class isused as a bridge between a form and theimplementation of the message pump, sothat if the message pump ends, the mainform closes, and if the main form closes,the message loop dies.

The message pump is on the Thread-Context object for the current thread (amethod called RunMessageLoopInner). Inaddition, the ApplicationContext and itsMainForm are cached as fields in theThreadContext. When the message pumploop finishes, Dispose is called on theThreadContext object, which enumeratesall the windows created on the currentthread and then disposes each one. Thismeans that when the message pump ends,the forms are allowed to clean up theirresources.

The other requirement is that if the formcloses, then the message loop should bestopped. The ApplicationContext objecthas a method called OnMainFormDestroy,which is added to the form’s HandleDe-stroyed event when the context object’sMainForm property is set in the con-structor. This event is the last one raisedwhen a form window is destroyed; at thispoint, the form object is alive. OnMain-FormDestroy calls ExitThread, which stopsthe message pump. Again, when the mes-sage pump loop finishes, all the forms cre-ated on the thread are disposed, and theirDispose methods are called to clean upthe components they hold.

The first thing that ExitThread does isobtain the thread’s ThreadContext object,then it tests to see if there is a contextobject. If so, Dispose is called on this ob-ject. ApplicationContext.Dispose merelyreleases the reference for the main form(if it has a reference) so that there is oneless reference to prevent the form frombeing finalized. If there is no applicationcontext object, then the thread contextobject is disposed and if there are nomessages in the message queue, then allthe windows created on the thread aredisposed as mentioned earlier. If thereare messages in the message queue, themessage pump should handle those mes-sages, but no more, so ThreadCon-text.Dispose provides an asynchronousshutdown mechanism. It works like this:Dispose posts the custom messageMSG_APPQUIT (an operation that doesnot block) to the thread’s message queue.The message pump is still active at thispoint and so after it has handled all theother messages in the queue, theMSG_APPQUIT message is handled bydisposing all the thread’s windows, thenposting the WM_QUIT message, whichwill finally end the loop.

As you can see, the mechanism to closedown the message loop and handle win-dows disposal is ordered, and much ofthe code is similar to the code that you’llsee in a Win32 application.

ThreadingThe final issue I want to mention is thread-ing. The messages for a window are


placed in the message queue for thethread that created the window, so if youinteract with a control, you must do thison the GUI thread. On the other hand,the GUI thread (usually the main thread)spends all of its time pumping the mes-sage queue. When a message is retrieved,it is dispatched to the windows procedurefor the appropriate window. The actualwindows procedure is registered to be amethod called WndProc in a class calledControlNativeWindow, a nested class inControl. This method does little process-ing and passes the message onto theWndProc method defined for the .NETcontrol. This method, and the method itoverrides in the control’s base class, is ef-fectively a huge switch statement that han-dles individual messages by raising events,just like any Win32 process.

To generate events, the control usuallyhas a method with the prefix On (for ex-ample, OnResize), which obtains the del-egate for the event (for example, Resize)and invokes it. You can handle the mes-sage either by overriding the On methodor by adding a delegate to the event. Ifyou override the On method, you mustmake sure that you call the base class im-plementation so that the event delegate isstill invoked. The important point is thatthe thread that pumps the message queueis the same thread that runs the WndProc,which is the same thread that invokes the

event delegate. So if your event handlersare lengthy, then you are preventing thepumping of the message queue, whichmeans that messages intended to updatethe UI are not handled in a timely fash-ion. It makes sense to execute lengthy op-erations on another thread.

However, when an operation interactswith the UI, it does so by generatingmessages, and as I have already men-tioned those messages must be sent andhandled on the correct thread. WindowsForms provides a mechanism to do this.The Control class implements an inter-face called ISynchronizedInvoke withthree methods and a property. TheInvokeRequired property indicateswhether the control is thread safe. If youtry to access it from another thread, youshould use Invoke. Invoke is passed adelegate and an array of arguments; thedelegate is the code that you want to beinvoked on the GUI thread. In effect,this method checks to see if the currentthread is the GUI thread. If it is, no mar-shaling is required and the delegate isinvoked straight away. If the thread isnot the GUI thread, then the delegateand parameters are put into a separate“job” object and added to a queue main-tained by the control. The GUI thread isthen informed by posting a custom mes-sage to its message queue. The handlerfor this message reads all the job objects

in the control’s queue and invokes thedelegate on each one.

The other two methods on the interfaceare BeginInvoke and EndInvoke, which letyou invoke the delegate synchronously.Of course, the whole point about Invokeis that the delegate is invoked on anoth-er thread, and the difference between thisand the asynchronous methods is that In-voke blocks until the delegate has com-pleted, whereas BeginInvoke returns assoon as the message is posted to the GUIthread and provides a call object that youcan test for completion of the invocation.This only makes a difference if the dele-gate returns a value, in which case youshould call EndInvoke to retrieve the dataat a later time.

ConclusionTo effectively use Windows Forms, youneed to have an understanding of howWin32 windowing works. Merely know-ing the mechanics of Windows Forms isnot enough. To be able to use the libraryeffectively and to prevent you from writ-ing code that could have serious effectson the responsiveness of your user inter-face, you have to know and apply Win32windowing principles.

DDJ


Listing Oneclass IndentListView : ListView{

public int InsertIndented(ListViewItem lvi, int indentLevel);protected int InsertItem(ListViewItem lvi, int indentLevel, int id);protected void SetItemText(int itemIndex, int subItemIndex, string text);class LVIData{};private ArrayList operations = null;protected override void OnHandleCreated(EventArgs e);

}

Listing Two[DllImport("user32")]static extern int SendMessage(IntPtr hWnd, int msg,

int wParam, ref LVITEM lParam); [StructLayout(LayoutKind.Sequential)]struct LVITEM {

public const int LVIF_TEXT = 0x0001;public const int LVIF_IMAGE = 0x0002;public const int LVIF_PARAM = 0x0004;public const int LVIF_INDENT = 0x0010;public const int LVM_INSERTITEM = 0x1007;public const int LVM_SETITEMTEXT = 0x102d;

public uint mask; public int iItem; public int iSubItem; public uint state; public uint stateMask; public string pszText; public int cchTextMax; public int iImage; public IntPtr lParam;public int iIndent;public int iGroupId;public uint cColumns;public uint puColumns;

}

Listing Threeprotected int InsertItem(ListViewItem lvi, int indentLevel, int id){

int dispIdx = GetCount() + 1;LVITEM lvitem = new LVITEM();lvitem.mask = LVITEM.LVIF_TEXT | LVITEM.LVIF_PARAM

| LVITEM.LVIF_IMAGE | LVITEM.LVIF_INDENT;lvitem.iItem = dispIdx;

lvitem.pszText = lvi.Text;lvitem.iImage = lvi.ImageIndex;lvitem.iIndent = indentLevel;lvitem.lParam = (IntPtr)id;

AddToItemsTable(lvi, id);dispIdx = SendMessage(this.Handle, LVITEM.LVM_INSERTITEM, 0, ref lvitem);UpdateItem(lvi, id, dispIdx);

for (int idx = 0; (idx < lvi.SubItems.Count); ++idx){

SetItemText(id, idx, lvi.SubItems[idx].Text);}return lvi.Index;

}protected void SetItemText(int itemIndex, int subItemIndex, string text){

LVITEM lvitem = new LVITEM();lvitem.mask = LVITEM.LVIF_TEXT;lvitem.iItem = itemIndex;lvitem.iSubItem = subItemIndex;lvitem.pszText = text;SendMessage(this.Handle, LVITEM.LVM_SETITEMTEXT, itemIndex, ref lvitem);

}private int GenerateNextID(){

Type type = typeof(ListView);MethodInfo mi = type.GetMethod("GenerateUniqueID",

BindingFlags.NonPublic | BindingFlags.Instance);return (int)mi.Invoke(this, null);

}private void AddToItemsTable(ListViewItem lvi, int id){

Type type = typeof(ListView);FieldInfo fi = type.GetField("listItemsTable",

BindingFlags.NonPublic | BindingFlags.Instance);Hashtable listItemsTable = (Hashtable)fi.GetValue(this);listItemsTable.Add(id, lvi);fi = type.GetField("itemCount",

BindingFlags.NonPublic | BindingFlags.Instance);int count = (int)fi.GetValue(this);fi.SetValue(this, ++count);

}private void UpdateItem(ListViewItem lvi, int id, int dispIdx){

Type type = typeof(ListViewItem);MethodInfo mi = type.GetMethod("Host",

BindingFlags.NonPublic | BindingFlags.Instance);object[] args = new object[]{this, id, dispIdx};mi.Invoke(lvi, args);

}

Listing Fourclass LVIData{

public ListViewItem lvi;public int IndexLevel;public int ID;

}public int InsertIndented(ListViewItem lvi, int indentLevel){

if (this.Handle == IntPtr.Zero){

if (operations == null) operations = new ArrayList();

LVIData data = new LVIData();data.lvi = lvi;data.IndexLevel = indentLevel;data.ID = GenerateNextID();operations.Add(data);return data.ID;

}return InsertItem(lvi, indentLevel, GenerateNextID());

}protected override void OnHandleCreated(EventArgs e) {

if (operations != null){

for (int idx = 0; idx < operations.Count; ++idx){

LVIData data = operations[idx] as LVIData;InsertItem(data.lvi, data.IndexLevel, data.ID);

}operations = null;

}base.OnHandleCreated(e);

}

Listing Fivepublic class MainForm : Form{

private IndentListView lv;public MainForm(){

this.lv = new IndentListView();this.lv.Dock = DockStyle.Fill;this.lv.View = View.Details;ImageList il = new ImageList();Icon ic = LoadIconFromResources("first_icon");il.Images.Add(ic);lv.SmallImageList = il;

ColumnHeader header;header = new ColumnHeader();this.lv.Columns.Add(header);header.Text = "Data";header.Width = 200;this.Controls.Add(this.lv);

ListViewItem lvi = new ListViewItem("one", 0);lv.InsertIndented(lvi, 0);lvi = new ListViewItem("two", 0);lv.InsertIndented(lvi, 1);lvi = new ListViewItem("three", 0);lv.InsertIndented(lvi, 0);

}static void Main(){

Application.Run(new MainForm());}

}

Listing Sixclass MainForm : Form{

MainForm(){

this.Closed += new EventHandler(ClosedForm);}void ClosedForm(object sender, EventArgs e){

Application.ExitThread();}static void Main(){

MainForm form = new MainForm();form.Visible = true;Application.Run();

}}

DDJ


More .NET on DDJ.comASP.NET2theMax: Repeating Data in ASP.NET PagesASP.NET includes page caching to help you serve up staticpages faster, but for even better performance, use the cachingbuilt into WS2003/IIS 6.0.

Available online at http://www.ddj.com/documents/ddj050209asp/

Class loaders load Java classes intomemory and prepare them for useby the JVM. Engineers doing Java de-velopment become familiar with the

class loader through the CLASSPATH en-vironment variable, or command-line op-tion to the VM, that tells the class loaderwhere to look for class files. For most ap-plications, the default class loader sup-plied with the Java Runtime Environment(JRE) works just fine after supplying it withthe correct parameters.

The Eclipse environment, with its plug-in model, cannot use the built- in classloader, as it needs finer control over howthe system locates classes. The plug-inmodel, which the designers of Eclipsecreated gives the user the ability to cre-ate a plug-in with its own CLASSPATHthat is independent of other plug-ins. Thisdesign decision lets plug-in developerscontrol what classes their plug- in willload while not interfering with any oth-er plug-in.

When creating a plug-in manifest, usersspecify the class path almost as a side ef-fect. While the class path created by theplug-in manifest is usually correct, some-times the programmer needs more con-trol over how the Eclipse class loader op-erates than is offered by the plug- inmanifest editor. In this article, I describethe theory and strategy behind controllingthe Eclipse class loader, showing howEclipse can be trained to load and exe-cute an arbitrary class— even if the classdoes not reside in the plug-in’s declaredclass loader. Handy uses for such a fea-

ture include the ability to create classesthat do some housekeeping chores thatyou would like to keep in your develop-ment toolbox, but don’t necessarily wantto include in production code.

Bootstrap and Custom Class LoadersAll classes used in a Java application mustbe loaded by either the bootstrap classloader, otherwise known as the System classloader, or through a custom, user-definedclass loader. The bootstrap class loader isthe “root” class loader, and as such, formsan integral part of the JRE, responsible forloading the basic Java library classes. Ev-ery time a class is instantiated with Java’snew keyword, the JVM delegates the taskto the current class loader, which is by de-fault the bootstrap class loader.

Custom class loaders, on the other hand,are not part of the JRE; rather, they aresubclasses of the abstract base classjava.lang.ClassLoader and are compiled,instantiated, and run like any other Javaclass. However, there are a few generalrules that custom class loaders are requiredto follow:

• Class loaders must have a parent classloader, except for the bootstrap classloader.

• Class loaders must delegate the loadingof classes to their parent class loader be-fore attempting to load a class them-selves.

• A class can only be loaded once by anyone class loader; any attempts to loadan identical class (where the name servesas the identifier) in a class loader thathas already loaded a class of the samename results in the cached copy of theoriginal class being loaded. This rule im-plies that a class may be loaded severaldifferent times by several different load-ers, but only once by each class loader.

Class Loading Nickel TourThe basics of class loading are straight-forward; it’s the nuances that make cus-tom class loading so tricky. Class loadersfollow a hierarchical structure, with thebootstrap class loaders as the root of theclass loader “tree” hierarchy; see Figure 1.

Again, Java class loading follows a “del-egation” model, in which class loaders are

expected to first attempt to delegate theloading of a class to the parent class load-er before subsequently attempting to loadthe class themselves. The class loader thatfirst receives the request to load a class isreferred to as the initiating class loader.The class loader that actually ends up load-ing the class is referred to as the effectiveclass loader. One important factor that de-rives from this upward delegation modelis the issue of class visibility. In Java, classvisibility extends upwards through theclass loader hierarchy. In practice, what

this means is that a particular class load-er instance can access any classes that itsparent hierarchy has loaded, along withany classes that it has itself loaded.

For example, if a class loading requestis received by class loader A (which hasthe bootstrap class loader that I refer toas CL as its parent), then class loader A isreferred to as the initiating class loader.By definition, class loader A must dele-gate the actual loading of the class to itsparent before attempting to load the classitself. If the parent class loader CL suc-ceeds in loading the class, then CL be-comes the effective class loader. If CL can-not load the class, then the responsibilityfor loading the class is returned to classloader A, which will attempt to load theclass itself. Assuming that the class can beloaded by class loader A, then class load-er A becomes both the initiating and ef-fective class loader. At this point, if youwere to examine the class visibility scope,you would see that from the perspectiveof class loader A, all classes currently ac-cessible by the parent class loader CL are

Eclipse & Custom Class LoadersPreparing classes forthe JVM

GREG BEDNAREK

Greg is an alumnus of McGill UniversityComputer Engineering and is currentlyemployed as a software contractor withMedrad Inc. in Pittsburgh, Pennsylvania.He can be contacted at [email protected]

E M B E D D E D S Y S T E M S


“The Eclipseplatform conformsto the delegatingmodel for Java classloading”

visible. However, the converse is not true.From the perspective of the parent classloader CL, classes loaded by class loaderA are not visible.

This leads to the following model forclass loading, based on the java.lang-.ClassLoader implementation:

1. A request is made to load a class (thiscan occur in any number of ways, suchas through a new or through a directcall to the loadClass( ) method of a cus-tom class loader).

2. The class loader calls its own loadClass( )method, which in turn: (a) invokes itsfindLoadedClass( ) method to see if theclass has already been loaded by thisclass loader. For a previously loadedclass, the loader returns a reference tothe class maintained in its cache; (b) in-vokes the loadClass( ) method of its par-ent class loader, thus delegating the choreto the parent to perform first. If the classcannot be loaded by the parent, the classload again is delegate to its parent, un-til the calls reach the bootstrap class load-er; (c) if the class has still not been load-ed, the class loader invokes its ownfindClass( ) method attempting to findand load the class itself.

The loadClass( ) method of java.lang-.ClassLoader itself simply attempts to loadthe class specified by its input parameter,given as a string.

In general, it is safer (and easier) tooverride the findClass( ) method and leavethe loadClass( ) method intact when cre-ating a custom class loader based uponjava.lang.ClassLoader, as it is the load-Class( ) method that enforces the delegat-ing nature of Java class loaders. Simplyoverriding the findClass( )method allowsfor class loading based on custom needs,while at the same time maintaining ex-pected compatibility with other Java classloaders.

The Eclipse Platform and Class LoadingThe Eclipse platform conforms to the del-egating model for Java class loading.Eclipse maintains its own custom “system”class loader, which is loaded by the boot-strap class loader when Eclipse starts. Thisclass loader, called org.eclipse.core.inter-nal.boot.PlatformClassLoader, itself in-stantiates an org.eclipse.core.internal.plug-ins.PluginClassLoader for each plug-in.Each PluginClassLoader is then responsi-ble for loading the classes associated withits particular plug-in. From the perspec-tive of a particular plug-in, you can viewEclipse as having a default class loadercalled PluginClassLoader, even though thisclass loader actually resides several layersbelow the real system, or bootstrap, classloader.

A Concrete ExampleMacroRunner is a plug-in that illustratesthe concepts presented here. MacroRun-ner (available electronically; see “ResourceCenter,” page 5) lets users select an arbi-trary class from the filesystem and exe-cute it within the currently running in-stance of Eclipse. With MacroRunner, it’seasy to create and execute bits of Javacode as macros that automate repetitiveor detail-oriented chores in Eclipse.

A particular class can only be loadedonce by any instance of a particular classloader. Since there is only ever one in-stance of the bootstrap, or system classloader, it is not possible to load an arbi-trary macro class from the filesystem andthen subsequently modify the class andreload the identically named class at run-time.

An attempt to reload the modified classat runtime using the default, system classloader will not result in an error, but atthe same time it will not reload your classeither— rather it notices that a class bythe same name has already been loadedby the class loader and, therefore, looksto its internal cache for a copy of theclass. This behavior is correct and ex-pected, but for the purpose of executingJava macros inside Eclipse, this just doesnot work very well.

With a custom class loader, however, anew class loader instance could be creat-ed each time we needed to load a macroclass file, thereby circumventing the classcache. Think of it as follows: If classes aredefined uniquely by their name togetherwith the class loader instance that loadedthem, then instantiating a new class load-er allows for the loading of a modifiedversion of the same class in this new classloader instance.

In Figure 2, the System class loader hasloaded two instances of a class loaderCL—CL1 and CL2. As CL1 and CL2 aretwo different instances of the same classloader, they can each load an instance ofthe same class A, labeled A1 and A2above. The instances of A (A1 and A2) donot necessarily have to contain the samecode, even though they are both instances

of class A. It is possible to load an instanceof class A using CL1, then modify the con-tents of class A on the file system, andload this new class A in another class load-er instance CL2. At this point, two sepa-rate though identically named classes areloaded. In fact, it might be more appro-priate to name class instances A1 and A2rather CL1:A1 and CL2:A2, respectively,as a class instance is in the end definedboth by the class name and the class load-er that created the instance of the class.Due to the aforementioned upwards vis-ibility of classes through the class loaderhierarchy, any classes loaded by a classloader then have access to any classes thatits parents can access. In Figure 2, CL1can see class instance A1 along with anyclass instances loaded by its parent classloader, the system class loader. Likewise,CL2 can see class instance A2 along withany class instances loaded by the systemclass loader. CL1, however, cannot seeclass instance A2 nor can CL2 see classinstance A1.

Inside MacroRunnerMacroRunner uses a plug-in created us-ing the “Sample Action Set” sample plug-in of the Eclipse Custom Plugin Wizard asits base. This plug-in adds a menu entryto the Eclipse menu bar, and executes theMacroAction.run( ) method whenever themenu selection is chosen.

Inside this run( ) method, a simple in-put dialog is spawned querying the userfor the absolute path to a precompiledJava class file to run within the currentEclipse environment. MacroRunner expectsthat the macro class file implements thejava.lang.Runnable interface, and alsothat the class itself resides in the defaultJava package. After some simple errorchecking to validate that the path pro-vided actually points to a file on the lo-cal file system, the loading of the class isinitiated by the custom class loader,MacroClassLoader; see Listing One.

The process of instantiating a customclass loader and subsequently instanti-ating a new instance of a class is quitesimple. Notice the check made after the


Figure 1: Hierarchical structure of classloaders.

Bootstrap/SystemClassLoader

CustomClassLoader

Instance ofClass A

Figure 2: System class loaders.

SystemClassLoader

CL1 CL2

A1 A2

instantiation of the macro class in ListingOne. This check is made to ensure thatthe custom class loader MacroClassLoad-er actually succeeded in loading the class.Due to the delegating nature of Java classloaders, if the class classToLoad, locatedin the directory classLocation, lies on theCLASSPATH of the System class loader,then when the custom class loader Macro-ClassLoader delegates to its parent, theSystem loader loads the class, and effortsto load the class using the custom classloader have been in vain.

In the context of Eclipse, to ensure thatthis does not occur, you must be certainthat the class being loaded is not visible tothe parent Eclipse class loader. Because theMacroRunner plug-in project has no de-pendencies on any other Eclipse plug-ins(other than the org.eclipse.core.resourcesplug-in, which is required for interactionwithin the Eclipse framework), you can cre-ate your java.lang.Runnable class in anyproject outside of the MacroRunner project.

MacroRunner’s Class LoaderThe custom class loader implementationitself is straightforward, leveraging java-.net.URLClassLoader for most of its func-tionality. The URLClassLoader provides a

full class loader implementation accept-ing an array of java.net.URL instances asinput to its constructor. This array ofURLs, which can either point to locationson the local filesystem or on a network,is then used as a class search path whenthe class loader instance is asked to loada class.

Following the delegating pattern of aJava class loading, the URLCLassLoaderimplementation first attempts to delegatethe loading of any class to its parent classloader. If the parent class loader cannotlocate the particular class in question, theURLClassLoader itself tries to locate theclass along its URL search path(s).

MacroClassLoader (Listing Two) simplyextends java.net.URLClassLoader, provid-ing a constructor accepting the path to theclass to be run, as well as the parent classloader of the calling instance. With this in-formation, a URL[ ] array of size 1 is cre-ated with the class location informationconverted from a simple path into a URLlocation. To learn more about Java URLs,consult the documentation for thejava.net.URL class (http://java.sun.com/j2se/1.4.2/docs/api/java/net/URL.html).

Since the java.net.URLClassLoader im-plementation meets all of your needs, there

is no need for any method modification,other than the slight simplification we havemade to the constructor to hide the needfor URLs from the application developer.If, however, a different behavior than thedefault were required, you could extendthe URLClassLoader in a few different waysby creating a class descending fromjava.lang.ClassLoader (or any of its sub-classes) and overriding one or more ofthe following methods:

• Class findClass(String name). The find-Class( ) method is the preferred routethrough which to modify class loadingbehavior. This method resolves the .classfile associated with the input name pa-rameter, reading in this class represen-tation to an array of bytes, and return-ing a class instance representing thisbyte array through a call to the de-fineClass( ) method. The ability of ap-plication developers to modify thesearch paths of a class loader throughan overridden findClass( ) method iswhat truly provides the power and flex-ibility of Java’s dynamic class loadingmethodology.

• String findLibrary(String name) is ananalogue to the findClass( ) method,except that this method can be over-ridden to resolve native libraries in acustom fashion, such as shared objectsin UNIX-like operating systems or DLLsin Windows. Any library loaded viajava.lang.System.loadLibrary( ) at run-time has the resolution of the librarypath accomplished through the find-Library( ) method of the calling class’sclass loader.

• Class loadClass(String name). It is notgenerally good practice to override thismethod, as the enforcement of the del-egating nature of Java class loading isperformed here. If, however, a customapplication required such a feature,the loadClass( ) method could be over-ridden to change the delegating na-ture of class loading for a particularclass loader instance so that the childclass loader first attempted to load aclass by itself before delegating to itsparent.

• Class defineClass(String name, byte[ ]classBytes, int offset, int length). ThoughdefineClass( ) is a final method and assuch cannot be overridden, it warrantsmention as it forms an integral part ofthe ClassLoader implementation. ThedefineClass( ) method accepts an arrayof bytes defining a .class file and returnsa java.lang.Class instance representingthat array of bytes, or a ClassFormat-Error exception on error. The find-Class( ) method should return a java-.lang.Class instance created by thismethod.


To install the MacroRunner and Macro-RunnerRunnable sample Java macroplug-ins, follow these steps:

1. Download MacroRunnerProjects.tar.gz(available electronically; see “ResourceCenter,” page 5).

2. Unzip and untar with your favoriteutility or using GNU tar (or compati-ble) by issuing the following (or sim-ilar) command: tar -zxf MacroRunner-Projects.tar.gz.

3. In Eclipse, select File…Import fromthe main menu. When prompted, se-lect “existing Project into Workspace”and click Next. At the next screen,browse to the extracted MacroRunnerdirectory and click Finish to importthe project. Repeat this step for theMacroRunnerRunnable directory.

5. Both the MacroRunner and Macro-RunnerRunnable projects should nowbe imported into your Eclipseworkspace. To run the code, createand execute a runtime workbenchconfiguration through the EclipseRun… Run… menu entry (acceptingall defaults for a new runtime work-bench configuration).

6. When the runtime workbench comesup, if you cannot see the MacroRun-

ner menu entry, then select Win-dow…Customize Perspective…Other…

7. Ensure that the MacroRunner Demolist entry is checked before clickingOK to accept.

8. To execute the MacroRunner democode, click the MacroRunner menuentry.

9. In the resulting file selection dialog,browse to the macro class you wish toto run (samples are provided in theMacroRunnerRunnable plug-in bin di-rectory), and click OK. Check the con-sole output for any error messages ifyour macro does not run successfully.

ReferencesClassLoader API spec (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ClassLoad-er.html).

URLClassLoader API spec (http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLClassLoader.html).

URL definition (http://www.ietf.org/rfc/rfc2396.txt).

URLClassLoader API spec (http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLClassLoader.html).

—G.B.

Installing the MacroRunner Project

Listing Onetry {//Instantiate ClassLoader, passing the location (directory) of the class to//load and the parent class loaderClassLoader loader = new MacroClassLoader(classLocation,

this.getClass().getClassLoader());

//Load the class with the custom ClassLoader;// we can now instantiate new using the newInstance() methodClass klass = loader.loadClass(classToLoad);

//Instantiate a new instance of the loaded class - this is equivalent to the//"new" keywordRunnable loadedClass = (Runnable)klass.newInstance();

//Check to make sure that our custom ClassLoader succeeded// in loading the class, and not the System class loaderif(this.getClass().getClassLoader() == loadedClass.getClass().getClassLoader()){//If we get here, then the class instance loadedClass was//loaded by the System class loader, not by our custom class loader!!!printErrorMsg(classToLoad,

new String("Class was not loaded by the proper class loader"));return;

}} catch (ClassCastException e){printErrorMsg(classToLoad,

new String("Class must implement java.lang.Runnable"));return;

} catch (ClassNotFoundException e){printErrorMsg(classToLoad,

new String("Class could not be located at " + classLocation));return;

} catch (IllegalAccessException e){printErrorMsg(classToLoad,

new String("ClassLoader threw an IllegalAccessException"))return;

} catch (InstantiationException e){printErrorMsg(classToLoad,

new String("ClassLoader threw an InstantiationException"));return;

} catch (MalformedURLException e) {printErrorMsg(classToLoad,

new String("ClassLoader threw a MalformedURLException"));return;

}

Listing Twopublic class MacroClassLoader extends java.net.URLClassLoader {

public MacroClassLoader(String classLocation, ClassLoader parent)throws MalformedURLException {

//Create a new URL[] array with one entry, a URL expressing the//location of the String classLocation which is an input parameter to this//method. We will assume that the class location is on the local filesystem,//and will prepend the file:// (Unix) or file:/// (Windows) URL identifier as //well as replacing any Windows-style forward slashes with URL-style //backslashes

super(new URL[] {new URL("file:" + ((classLocation.charAt(0) == '/') ? "//" : "///" )

+ classLocation.replace('\\', '/') + "/")}, parent);}

}

Listing Threepublic class TestRunnableProjRename implements java.lang.Runnable {// (non-Javadoc) * @see java.lang.Runnable#run()

public void run() {IWorkspace workspace = null;IWorkspaceRoot workspaceRoot = null;IProject [] projects = null;

System.out.println("In TestRunnableProjRename...");

workspace = ResourcesPlugin.getWorkspace();workspaceRoot = workspace.getRoot();projects = workspaceRoot.getProjects();

//Iterate over all of the projects in the current workspacefor(int i=0; i < projects.length; i++){

//Closed projects should be ignored...if(!projects[i].isOpen())continue;

//Rename all projects in workspace to $(PROJNAME)_Modifiedtry {IProjectDescription description = projects[i].getDescription();System.out.println("\tAttempting to change project "

+ description.getName());description.setLocation(description.getLocation());description.setName(description.getName().concat("_Modified"));

//Actually perform the renameIResource resource = (IResource)projects[i];resource.move(description, true, false, null);

} catch (Exception e) {e.printStackTrace();

}}

}}

DDJ


A MacroRunner Sample Java MacroTo demonstrate how you can manipulatethe Eclipse environment, I turn to an ex-ample that renames all of the projects inthe current Eclipse workspace. Thoughnot very useful, this example still demon-strates how the Eclipse internals are ac-cessible to our macro file executed withthe aid of our custom class loader in theMacroRunner plug-in.

Listing Three is a class implementingthe java.lang.Runnable interface. The ex-ample leverages the Resource API ofEclipse in order to get a reference to thecurrent Eclipse workspace, and to all ofthe projects in the workspace. It then it-erates over all of the projects in theworkspace, appending the String "_Mod-ified" to each of the project names.

To see this macro code in action, com-pile the macro into a Java class file, runthe MacroRunner example from withinyour Eclipse instance, and load the macrofile via MacroRunner’s Java macro classloading functionality (see the accompa-nying text box entitled “Installing theMacroRunner Project”).

Common Custom Class Loader PitfallsDebugging custom class loaders isn’t ashard as you may think. During develop-ment work at TimeSys, we ran across these

common problems that may cause grieffor you as well when working with cus-tom class loaders:

Problem: Wrong class loaded. Yourclass loader fails to load a class that youwished it to load; because class loading isdelegated to the parent class loader, yourclass does load properly but not in thecorrect context

Solution: Call getClass( ).getClass-Loader( ).getClass( ).getName( ) on the in-stance of the loaded class that you have;this returns the name of the class that im-plements the class loader, which actuallyended up loading the instance of this class.Once you have established that your classloader is failing to load the class proper-ly, the next step should be to ensure thatthe class does not reside anywhere on theSystem class loaders’ CLASSPATH.

Problem: Parent Class loader finds classfirst. URLClassLoader delegates to its par-ent class loader before attempting to loadthe class with your derived class loader.

Solution: This is actually the expectedbehavior: Java class loaders generally fol-low the delegating model, which offersthe parent class loader the first attempt atloading any class. If, however, the parentclass loader cannot find the class to be

loaded, the loading of the class needs tobe handled by your class loader. If youwould like to make sure that your parentclass loader in Eclipse does not load a par-ticular class, make sure that the class liesin a plug- in external to the plug- in inwhich your class loading code resides,and also ensure that there are no inter-dependencies between the plug-ins.

ConclusionThe Eclipse environment has created itsown class loaders so that a plug-in’s classpath requirements won’t interfere with eachother; instead of the class loader being aSingleton with respect to the running JVM,each plug-in has it own. When workingwith Eclipse, the plug-in manifest describesthe class path for the plug-in.

While the stock class loader for Eclipseis adequate for most uses, the platform isflexible enough that you can still supplya custom class loader if necessary. Theplug-in I describe in this article uses a cus-tom class loader to load and execute anarbitrary class within the current Eclipseinstance and serves as an excellent tool tolearn about the mechanics of creating yourown class loader as well as being a greatproductivity tool for Eclipse.

DDJ

“Ah, word of no meaning! behind whosevast latitude of mere sound we intrench ourignorance…”

—Edgar Allen Poe, Ligeia

In taking the word “paradigms” as partof the title of this column when Ilaunched it back in 1987, I was delib-erately granting myself broad latitude

in what I covered. The word had beenpopularized, or demonized in some cir-cles, 25 years earlier by philosopher ofscience Thomas Kuhn in his landmarkbook, The Structure of Scientific Revolu-tions. In a postscript to the 1969 revisionof that book, Kuhn acknowledged that hehad used the word in a slippery fashion,letting it mean different things at differenttimes. I took my inspiration from this, anddetermined to let the word mean anythingI pleased from month to month. It’sworked out pretty well, I think, allowingme to write about topics that impact com-puter programming with only a glancingblow. I could argue that I do this out ofa belief that an occasional insight from theoutside is needed to shake up our stag-nant thinking, but mostly I just do it toentertain myself.

This month, in that spirit, I delve intoa book by a professor of psychiatry atJohns Hopkins Medical School. As it hap-pens, what this book has to say is sur-prisingly relevant to the paradigms we en-counter in the computer industry.Although I notice that even in that sen-tence, I am using the word “paradigms”in more than one sense. It’s such a con-veniently slippery word.

Crazy Like a FoxJohn D. Gartner has a favorite word, too,and while his word may not be as slip-pery as mine (and Kuhn’s), he does get a

lot of mileage out of it. His word is “hypo-manic,” and last October, he wrote to askme to examine an advance copy of hisbook The Hypomanic Edge: The Link be-tween (a little) Craziness and (a lot of)Success in America (Simon & Schuster2005; ISBN 0743243447) when it was avail-able in a few months. I agreed, and inFebruary, the book arrived.

Although it is commonplace to describethe frenzy of entrepreneurial activity andthe stock market feeding that went on dur-ing the dot-com bubble as “manic,” theuse of the word is metaphorical. Nobodyhas seriously suggested that entrepreneursand speculators are crazy merely by virtueof being entrepreneurs and speculators.But Gartner, a clinical psychologist, de-cided to take the term “manic” literally—or rather the related term “hypomanic”—and see where that led.

Both “manic” and “hypomanic” are di-agnostic categories defined in the Diag-nostic and Statistical Manual of Mental Dis-orders of the AMA (usually referred to asDSM-IV). The crucial difference is that ma-nia is a disabling condition that almost in-variably leads to hospitalization, while hy-pomania is a condition that one can livewith and function in society. Hypomania isnot an illness, but “a temperament charac-terized by an elevated mood state that feels‘highly intoxicating, powerful, productiveand desirable’ to the hypomanic,” but thatdoesn’t burn out your nasal membranes.

“Hypomanics,” Gartner says, “are brim-ming with infectious energy, irrational con-fidence, and really big ideas. They think,talk, move, and make decisions quickly.Anyone who slows them down with ques-tions ‘just doesn’t get it.’ Hypomanics arenot crazy, but ‘normal’ is not the first wordthat comes to mind in describing them.”

Gartner’s first example of a hypoman-ic is one familiar to readers of this publi-cation: Jim Clark, founder of Silicon Graph-

ics, Netscape, and Healtheon. Jim Barks-dale, the normal cofounder of Netscape,described his colleague Clark as “a man-ic who has his mania only partly undercontrol.” Another source characterized himas “a perpetual motion machine with ashort attention span, forever hurtling atunsafe speeds in helicopters, planes, boats,and cars. When his forward motion is im-peded, Clark becomes irritable and bored.In his search for the stimulation of ‘thenew thing,’ he quickly loses interest in thecompanies he founds and tosses them intothe laps of his bewildered employees.”

At this point, you are probably reflect-ing that this description also fits someoneelse you could name. In fact, it fits a lotof people in this industry. What is trulyweird is that such behavior seems to beadaptive.

Pilot StudyIn the 1990s, Gartner decided to try a lit-tle pilot study to see if his hypothesis—that American entrepreneurs are largelyhypomanic — had legs. He interviewed 10Internet CEOs, asking them to judge whichof a list of traits were typical, in their opin-ion, of an entrepreneur. The traits werethose that define hypomania, although hedidn’t tell them that. Although his samplesize was small, the results were dramatic:All of the entrepreneurs agreed that vir-tually all of the traits were typical traits ofan entrepreneur, and did so emphatical-ly, sometimes stating that they wished theycould give a rating of 6 or 7 on the 5-pointscale of agreement.

Rather than go on to further refine hishypothesis, Gartner thought about it fur-ther and decided to explore a slightly dif-ferent theme— the relationship betweenbeing an American and being hypoman-ic. He began to entertain the idea that therewas a genetic determinant of the charac-teristic American entrepreneurial spirit.

The HypomanicEntrepreneur

Michael Swaine

P R O G R A M M I N G P A R A D I G M S

Michael is editor-at-large for DDJ. He canbe contacted at [email protected].


Although researchers at Johns Hopkinsand elsewhere have found evidence of ge-netic determinants of mania and possiblyof hypomania, it seems odd to suggestthat there is any genetic determinant ofAmerican character. After all, America isthe great melting pot, a nation of immi-grants. Surely it is the last place you wouldlook for a genetic component to nationalcharacter.

But what if there is a genetic factor thatpredisposes one to take the rash step ofabandoning one’s home and moving acrossthe ocean? And what if it is the same ge-netic factor that predisposes one towardother forms of bold behavior, such as start-ing a business from a plan sketched on anapkin and convincing venture capitaliststo back you? If we are a nation of immi-grants and descendants of immigrants, havewe inherited an entrepreneurial spirit?

Americans work more hours than anyother people in the world. Starting a busi-ness is a more respected activity in Amer-ica than almost anywhere else. And fail-ing in business carries very little stigma inAmerica, in marked contrast with Europeand Japan. Is this American entrepreneurialcharacter due to a large proportion of hy-pomanics in this country? That’s what Gart-ner thinks.

In his book, Gartner profiles peoplefrom each century of America’s 500-yearhistory. The people he selects all playeda big part in making America what it is.They also, Gartner hypothesized, were allhypomanic. To test his hypothesis, he con-sulted their biographers.

Christopher Columbus? “He was astranger to doubt,” according to biogra-pher Gianni Granzotto. William Penn? Hy-pomanics always think big, and Penn bi-ographer Paul Johnson confirms that“[e]verything in Pennsylvania was big fromthe start.” Alexander Hamilton? He typi-fied the hypomanic’s supreme confidenceand rash risk taking by walking brashlyinto cannon fire during an early Revolu-tionary War skirmish. “He wasn’t so muchbrave as unafraid,” Gartner says. AndrewCarnegie, Louis B. Mayer, geneticist CraigVenter: hypomanics all, by the compellingevidence that Gartner presents.

Are You Nuts?In researching the personalities of each ofthese historical figures, Gartner presentedtheir biographers with this checklist of hy-pomanic traits:

energyrestlessactivequick-thinkingjumps from idea to ideadistractiblefast-talkingtalks a lot


Solution to “Jam Session,” DDJ, April 2005.1. We represent the 6 data bits followed by 4 check bits as follows:

b1 b2 b3 b4 b5 b6 c1 c2 c3 c4

Here is one possibility of what the check bits could do:

c1 is odd parity on b1, b2, b3, b4, b5, b6, c1, c4c2 is odd parity on b1, b2, b3, b4, c3, c2c3 is odd parity on b1, b2, c1, c3, c4c4 is odd parity on b1, b5, b4, c4

The trick is to locate the parity tests that don’t work if any bit has been flipped and to makesure these are all different. If all the parities are correct, then no single bit has been flipped.If there is an error in b1, then the parities corresponding to c1, c2, c3, and c4 will all be bad.

Error in b2, then c1, c2, and c3 will be bad.b3: c1, c2.b4: c1, c2, c4.b5: c1, c4.b6: c1c1: c1, c3c2: c2c3: c2, c3c4: c3, c4.

2. If you know the offset between the bit flip to the first receiver and to the second is anodd number, then you can use:

b1, b2, b3, b4, b5, b6, b7, b8, Podd, Peven

where Podd ensures that there are an odd number of 1s among b1, b3, b5, b7, Podd andPeven ensures that there are an odd number of 1s among b2, b4, b6, b8, and Peven. Hereis why this works: If no error is detected, then the data bits are correct. If the first receiverdetects an error, say for Podd, then the second can detect an error only for Peven becauseof the odd offset. Therefore, one of the two receivers will receive an error-free Podd groupand and an error-free Peven group.

3. If you know the offset is 4 bits, then Ivan Rezanka showed how 3 check bits are enough.First note that with an offset of 4, the bits that could flip are bit 1 for the first receiver andbit 5 for the second— denoted (1,5)— or (2,6), (3,7), (4,8), (5,9), (6, 10), (7,1), (8,2), (9,3),or (10,4).

Here is a three check bit solution:

b1 b2 b3 b4 b5 b6 b7 c1 c2 c3c1 is odd parity on b1, b4, b6, b7c2 is odd parity on b2, b4, b5, b7c3 is odd parity on b3, b4, b5, b6

If all parity bits check out for either receiver, we are done. If some pair changes, we detectas follows: If bit 1 to the first receiver and bit 5 to the second is flipped then c1 for the firstreceiver and c2 for the second will be incorrect. We denote this by (c1, c2). Note that whenmerging these two signals, the reconstructor must recognize this ordering. So, we abbrevi-ate this first situation as follows:

(1,5) -- (c1, c2, and c3)

and continue as follows:

(2,6) -- (c2, c1 and c3)(3,7) -- (c3, c1 and c2)(4,8) -- (c1 and c2 and c3, c1)(5,9) -- (c2 and c3, c2)(6,10) -- (c3, c3)(7,1) -- (c1 and c2, c1)(8,2) -- (c1, c2)(9,3) -- (c2, c3)(10,4) -- (c3, c1 and c2 and c3)

Note that each diagnostic is different.

Dr. Ecco Solution

dominates conversationgrandiosefeels destinedelatedcharismaticcharmingattractiveirritableexplosivesuspiciousimpulsiveacts on ideas immediatelyrisk taker (financial)risk taker (physical)risk taker (sexual)sex driveneeds little sleepdresses for attention

As was the case in his pilot study ofentrepreneurs, he found a high degree ofmatching. I thought I’d see how this listapplies to some characters familiar to read-ers of this publication, by consulting bookswritten about them. Understand, I’m notdoing serious research here. For one thing,since I was looking for material support-ing a match, I was clearly biasing my in-vestigation toward finding hypomania. Still,I thought it would be interesting to try theexercise.

Is Larry Ellison Hypomanic?To start out, I decided to see how well Or-acle’s Larry Ellison fits the profile. My pri-mary source on Larry was Mike Wilson’sbook The Difference Between God andLarry Ellison: God Doesn’t Think He’s Lar-ry Ellison (Quill, 1998; ISBN 068816353X),and I found evidence of just about everyone of Gartner’s checklist items. Thus:

Talks a lot: “The man across the hall talkedconstantly. That is what Stuart Feigen re-members. The foaming white water ofwords, the rushing river of noise. Larry El-lison just never quits.”Energy, active: There’s plenty of evidenceof this, including the hair-raising yachtingstories. Larry’s a maniac.Restless, jumps from idea to idea, fast-talking, dominates conversation: “Hetalked so fast that he mutilated some sen-tences and even some words…[W]hen hesaid the word ‘graduating,’ it sounded like‘gradjing.’”Quick-thinking: “Things he doesn’t knowhe picks up very, very quickly.” “He is oneof the most agile…minds I have ever met.”Wilson also describes a scene in which Lar-ry talked his way out of a speeding ticketby claiming to be a doctor on the way tothe hospital to “witness a craniometry.” Toone witness to this act, it illustrated “justhow quickly this guy could think on hisfeet.” To you and me, it may illustrate some-thing else.Distractible: I’d say so. Consider this Elli-son self-appraisal in which Larry contrastshis ability to stay fixed on a topic with thatof Bill Gates: “I was talking [by phone] withBill Gates about an issue, and he and I dis-

agreed about something, and he got off thephone. He called back two hours later andcontinued the conversation. He had thoughtabout it for two hours solidly. Which is some-thing I would never do. I can’t imagine.”Grandiose, feels destined: Wilson callshim “a myth of his own making,” someonewho “lived partly in a world of his own in-vention.”

Elated: He is “always unflaggingly positiveand optimistic.”Attractive, charismatic, charming: “Elli-son’s charms were such that even [an em-ployee Larry let go just before his stock op-tions would have made him a millionaire] saidhe still liked him.” When he testified in courtagainst an ex-girlfriend, the court staff pro-nounced him “charming.” “Even as a teenag-er he was the kind of person whom otherpeople followed…” And again, “He was thekind of person you would like to follow.”Irritable, explosive: “Ellison often eruptedwhen someone did something he did notlike or said something he considered stupid.”Suspicious: Do pre-nups count?Impulsive, acts on ideas immediately:Wilson recounts the story of Larry drivinghome from a date, seeing a house that heliked, and ringing the doorbell and offeringto write the owner a check on the spot. Ap-plying for a job, the young Ellison wants tostart immediately. He speaks of the “instantattraction” between himself and a woman.Risk taker (financial): Well, there’s theentire history of Oracle. And earlier in life,he “never worried about how the billswould be paid.” “He admitted that he was‘cavalier’ about spending money…”Risk taker (physical): Let’s just consultthe index of Wilson’s book, shall we? Herewe find entries like “Ellison, bicycling ac-cident of,” “Ellison, body surfing accidentof,” and “Ellison, broken nose of.” I thinkwe get the picture.Risk taker (sexual), sex drive: Wilsonsums it up by saying that Larry “lived…un-monastically.” I’m happy to leave it at that.Dresses for attention: According to TheMac Observer, “Larry Ellison does tend todress like a million…err…billion dollars.He wears an outstanding assortment of ex-cellent suits, and we doubt many of themcame from The Men’s Warehouse.”Needs little sleep: This one’s not clear, al-though his Gulfstream jet has reportedly

been interfering with the sleep of 900,000residents of San Jose.

Is Steve Jobs Hypomanic?It seemed obvious to me that Steve Jobswould fit the profile, but after I lookedthrough a few books and then consultedmy own memory for the evidence, I wasn’tso sure.

Energy, restless, active: Certainly. Apple’sforgotten founder, Ron Wayne, said, “SteveJobs was an absolute whirlwind and I hadlost the energy you need to ride whirlwinds.”[Owen Linzmayer’s Apple Confidential]Quick-thinking, jumps from idea toidea, dominates conversation: I’ve in-terviewed Steve, and I’ll testify to all thesetraits. But distractible, fast-talking, talks alot: It seems to me that these traits are atmost a 3 or 4 in terms of fit, not a 5.Grandiose, feels destined: “He wouldhave made an excellent king of France.”[Jef Raskin, quoted in Apple Confidential]Guy Kawasaki summed it put thus: “Steveis off the scale when it comes to chutzpah.”[Kawasaki, The Macintosh Way] AlanDeutschman, The Second Coming of SteveJobs: “His astonishing energy and charismaand chutzpah.” But I’m not so sure he feelsdestined to change the world; I think hejust feels supremely capable of doing it.Charismatic, charming, irritable, ex-plosive: In Steve, these traits are all partof the same thing. “Working for Steve wasa terrifying and addictive experience. Hewould tell you that your work, your ideas,and sometimes your existence were worth-less right to your face, right in front of ev-eryone…Working for Steve was also ecsta-sy…We would have worked in theMacintosh division even if he’d given usTang.” [Kawasaki]Attractive: Despite the glasses, thinninghair, and middle-aged bulge, toDeutschman, Steve at 40 is “still a hand-some man.”Risk taker (financial): He’s an en-trepreneurial risk taker. I don’t see muchevidence that he’s a financial risk taker. Ithink there’s a difference.Risk taker (physical): Not really.Risk taker (sexual), sex drive: On thispoint, Steve doesn’t fit the profile. I’m ajournalist; it’s my job to know this kind ofstuff. And I’m confident of my sources onthis. And I’m not naming names.Needs little sleep: I couldn’t find anythingon this.Dresses for attention: Maybe he does,but the impression is that he dresses toplease himself. At times in the past, he hasappeared “slovenly” [Deutschman], but to-day he favors those black mock turtlenecksand jeans. Surely, if he were dressing forattention, he would vary the formula.

I’m not saying that Steve Jobs is nor-mal. I just think that he requires his ownDSM-IV classification.

DDJ


“All of theentrepreneurs agreedthat virtually all of

the traits weretypical traits of an

entrepreneur”

Bluetooth and the “no wires” short-range networks you can build withit has always sounded like a greatidea. First engineered by handset

companies as a way for mobile phonesto talk to headsets, Bluetooth 1.0 (and,soon after, 1.1) was, for a long time, a so-lution in search of a problem. While Blue-tooth looked like a great way to build aPersonal Area Network of all the gear youmight carry, it sold tens and tens of unitsin the U.S. True, millions of Bluetooth-equipped handsets sold, mostly in Europe,but there were few accessories, and itwasn’t easy to configure. Early Bluetoothprotocol stacks were buggy and had se-curity problems.

That’s finally changed. Lots of GSMphones come with Bluetooth, there aredozens of Bluetooth headsets for sale. Maclaptops and HP TabletPC’s have support-ed it for two years, and we have Blue-tooth GPS receivers. The PalmOne Treo650 has Bluetooth. More importantly forDDJ readers, Windows XP directly sup-ports Bluetooth 1.1 devices, and manyWindows laptops have it built-in. ChaosManor Associate Editor Dan Spisak useshis Sony/Ericsson Bluetooth-enabledphone as an Internet link for his 15-inchPowerBook. In short, it’s a viable com-munications technology, though its con-figuration is still more complex than itshould be.

While we weren’t looking, Bluetooth 2.0arrived. The first news we saw was fromApple, which refreshed its PowerBookswith “Bluetooth 2.0 + Enhanced Data Rate”(http://www.apple.com/bluetooth/), sup-porting up to 3-Mbps conversations. Ofcourse, you’ll need another device sup-porting Bluetooth 2.0 to take advantage ofthis improvement. Alas, we can’t find anynow, but you can expect announcementsat CTIA (http://www.ctiawireless.com/),the big cellular communications show.

A quick check of the Bluetooth offi-cial site confirms that the 2.0 spec wasannounced in November 2004 (http://www.bluetooth.com/news/sigreleases.asp?A=2&PID=1437&ARC=1&ofs=), withchipsets available now from Broadcomand CSR, and upcoming from RF MicroDevices. Assuming the promises aren’toverhyped, the higher performance ofBluetooth can keep it relevant, partic-ularly for headsets and phone- to-computer networking, for which Wi-Fiis ill suited. It’s also fast enough to sup-port new devices, like tiny Bluetooth-enabled video cameras (both a blessingand a curse in my book).

There’s no real reason to go out look-ing for Bluetooth 2.0 yet, but since it’sbackward compatible, you don’t need toavoid it either. Like gigabit Ethernet, it’scoming, and you probably don’t have topay much attention. One day, you’ll justhave it.

When it Just Works, Wi-Fi is a boon,a blessing, more than a convenience. Toooften, though, range limits, dropouts, and

interference cause me to mutter undermy breath and look for an Ethernet port.Chaos Manor isn’t tiny; from the GreatHall upstairs to the back of the housewhere we watch TV is about 75 feet andseveral thick walls away. The TabletPCjust doesn’t connect to an access point(AP) from that far. My wife is under-standing, but even she would find theidea of scattering four or five accesspoints in the house objectionable, mere-ly so I could check e-mail from the backof the house.

Many 802.11b/g add-in cards and mostadvanced access points have dual anten-nas, using a voting receiver to choose thebest signal for whatever device they’retalking to. This “diversity receiver” ap-proach is common in large radio systems,from cellular to public safety. The anten-na with the largest signal “wins” with ra-dio sites atop multiple mountains or bigmasts.

Just as Bluetooth has evolved, so hasWi-Fi. We have some good news from thefront, in the form of the Belkin Pre-N Wi-Fi Access Point (AP) and PC Card (http://catalog.belkin.com/IWCatProductPage.process?Product_Id=184316). These de-vices use three receive and two transmitantennas. The AP combines them intothree stubby (4-inch) antennas sticking upfrom the front of the unit. (The PC Cardhas a slightly thicker than usual “bump”with the antennas in it, but is otherwiseunremarkable.) More antennas mean morepossible antenna paths from sender to re-ceiver, which are checked for the strongest

Bluetooth On the Move

Jerry Pournelle

C H A O S M A N O R

Jerry is a science-fiction writer and seniorcontributing editor to BYTE.com. You cancontact him at [email protected].



signal multiple times a second. This tech-nology is called “Multiple Input/MultipleOutput” (MIMO) and our tests show thatit’s more than hype.

We installed the Belkin AP in theGreat Hall, which is upstairs and in thefront of Chaos Manor. Associate EditorDan Spisak was impressed with the AP’ssetup; he set up WPA (Wi-Fi ProtectedAccess) for security, I associated theCompaq/H-P TabletPC with the net-work, typed in the WPA passphrase, andwas immediately connected using theTabletPC’s built- in 802.11 networking—MIMO is backward compatible withboth 802.11b/11g. I then carried theTabletPC to all parts of the house, outin the yard, back behind the pool, anddown the back street. Connectivity wasridiculously good, three bars of signalstrength way down past the next doorneighbor’s house in a place that hadzero signal using a D-Link system, eventhough that one had an external dough-nut antenna.

Belkin claims the best performance,though, if both ends are using Pre-N gear.So Alex installed the Pre-N PC Card in hislaptop, a four-year-old Dell Inspiron 7500running Windows XP. He walked aroundhalf the block while running a ping trace,with only a few dropouts the entire time.Even four houses down, with the AP onthe other side of the house from him, hegot a good enough signal to check e-mail.At extreme range, the link did drop, but itreestablished itself after a few seconds.Reconnections seemed much quicker thanwith older gear.

We don’t know the relative power con-sumption of the Pre-N gear, but its first-generation PC Card runs hot— too hot totouch when removed after half an hourof use. The PC Card’s control panel has asignal strength slider, so you can manual-ly adjust transmit power (and power con-sumption) as needed.

Belkin has been shipping its Pre-N gearsince October 2004; at CES in January,both Linksys and D-Link announcedMIMO equipment, with others sure to fol-low. Linksys’s Wireless-G BroadbandRouter with SRX (http://www.linksys.com/products/product.asp?prid=670&scid=35)has three antennas all at right angles toeach other, which my radio engineeringacquaintances say could well maximizethe chances of a good connection. Still,we’ve only tested the Belkin gear, andrecommend it if you need more range. Atthe very show where Linksys and D-Linkannounced their MIMO products, theBelkin Pre-N system was outperformingeverything in the room, with signals avail-able way down the hall in another ball-room. We were impressed then, and weremain so.

Pre-N, by the way, refers to equipmentdesigned toward the emerging 802.11nStandard, which should be finalized laterthis year (see http://grouper.ieee.org/groups/802/11/Reports/tgn_update.htmfor the official story). Be warned: There’sno guarantee that Pre-N gear will be 100-percent compatible with the new Stan-dard, and it is somewhat more expensivethan 802.11g.

The extra range does mean you’ll bemore vulnerable. With older APs, some-one would have had to park right out-side Chaos Manor to steal a signal. BobThompson points out that a directionalreceiving antenna made from a Pringlescan will let potential intruders interceptyour wireless installation from consider-ably further away, and of course he’sright. In any event, the Pre-N gear lightsup half the block to any good Pre-N re-ceiving antenna.

This is all the more reason to use WPA,available on both XP and Mac OS X(though there are currently no Mac driversfor the Belkin Pre-N gear). For Mac, justat the moment, you’ll have to use the old-er WEP, which is harder to set up butmore secure than nothing. When we weresetting up our wireless, we discovered ourneighbor has an unsecured wireless net;I could jump onto it and steal his band-width were I so inclined.

Regarding WEP: It’s better than nothingonly so long as you’re dealing with nosyneighbors and amateurs. The WEP pro-tocol has been cracked, and there are nowdownloadable programs for automatingthe process of breaking into WEP securedwireless systems. Using these requiresgathering a significant number of datapackets and running the analysis, but none

of this is beyond the ability of any seri-ous data or identity thief. (For more onthis, see http://www.securityfocus.com/infocus/1814 and http://www.informit.com/articles/article.asp?p=27666.)

Enough on security. If you want theextra range, and don’t want to drop inmore APs, we recommend the Belkin Pre-N gear. Just remember to set it up se-curely, which you should do anyway. Ex-pect further improvements, bettercompatibility, and higher speed from new-er products.

Belkin Pre-N competes well on price,too. We’ve converted the Chaos Manorwireless establishment to Belkin Pre-N,and there’s no part of the house I can’tcheck e-mail from. It’s WPA, not WEP,protected. Recommended.

Winding DownThe Game of the Month remains Ev-erquest II. Sony has done this one right.There are still problems with the craft-ing system: Player made equipment can’treally compete with stuff you can getfrom quests, and that’s sad. It’s also te-dious to do crafting, and the lack oflockers and other conveniences in thecrafting establishments doesn’t help abit. Finally, the marketing system sucksdead bunnies: They really need to goover to something like the Star WarsGalaxies marketing system, with ven-dors accepting wares for sale on con-signment in exchange for a smallmarkup. This nonsense about having tosit in your hotel room waiting for cus-tomers is boring and needless. On theother hand, Sony has added many newquests and features, and has done wellby the adventurer class. Now it’s timefor them to look at the problems ofcrafters and merchants.

The first Computer Book of the Monthis Small Websites, Great Results, by DougAddison (Paraglyph Press, 2004; ISBN1932111905). This is a book for smallbusiness people who need a web site toadvertise or sell their wares, and haven’tthe time to learn web design from theground up. The other Computer Booksof the Month are the series from O’Reil-ly & Associates called “Personal Train-er…” There is one on Windows XP, whichI doubt anyone reading this will find use-ful, but there are two others—Power-Point 2003 Personal Trainer (ISBN0596008554) and Excel 2003 PersonalTrainer (ISBN 0596008538)— that can bereally useful for folks who have to usethose programs and don’t know muchabout them.

DDJ

“While we weren’t looking,

Bluetooth 2.0arrived”

The supreme misfortune is when theory outstrips performance.

—Leonardo da Vinci

Leonardo was describing how light andshade affect the colors on a sphere,not how to design an embedded sys-tem, but his precept remains painful-

ly true for us today, five centuries later. Wehave no shortage of theories describinghow to create reliable systems, but realitycontinues to smack us upside the head.

Theory says that design reviews, of bothsoftware and hardware, weed out errorsranging from incomplete and misleadingspecifications and bad implementationchoices, all the way down to simple ty-pos. The reality is that many eyes oftensee what should be there, not what’s ac-tually in place.

Theory says that the choice of languageaffects the overall error rate, as (for ex-ample) C and its descendants permit er-rors in places where Java and its relationsdon’t even have places. Java is not a sil-ver bullet, although it is a step in the rightdirection.

Several recent, large-scale space mis-sions provide perfect examples of how er-rors can lurk undiscovered in even themost reliability-conscious programs. Intheory, the procedures applied during theprograms should preclude these errors,but, in practice, even one overlooked er-ror can be catastrophic.

Let’s examine some in-flight errors, thensee what Java becomes when it’s ready totake flight.

Errors of CommissionOne of the fundamental truths of hard-ware design is that stuff breaks. Whenyour mission requires high reliability, youmust include redundant hardware, detectdevice failures, and activate those back-up gizmos. Spacecraft take this to an ex-treme because repair isn’t possible: The

hardware must be both inherently reliableto reduce the number of failures and in-clude spares so the mission can continuedespite failures.

NASA’s Genesis spacecraft spent threeyears in space collecting solar-wind parti-cles in silicon, diamond, germanium, sap-phire, aluminum, glass, and gold sheets.After reentering the Earth’s atmosphere,the craft would deploy a parasail and besnagged in mid-air by a helicopter to avoidany shock or contamination caused byground contact. In fact, the sample col-lection capsule would be opened in aClass-10 clean room.

From all reports, the mission proceed-ed flawlessly, right up to the momentwhen the first drogue parachute shouldhave deployed. Something was obvious-ly wrong, though, as the long-range track-ing camera showed the disk-shaped cap-sule tumbling wildly without a parachute.Figure 1 shows the result: Genesissmacked into the desert floor at just un-der 200 mph, shattering the collectors,cracking the sample capsule, and coatingeverything with dried mud.

Two redundant avionics packages, eachwith two of the accelerometer switches inFigure 2, failed to detonate the parachute-ejection charges. The switches should haveactivated at specific accelerations during re-entry, but all four failed.

Subsequent investigation revealed thatthe switches were installed upside-downon the circuit boards. The expected ac-celeration simply held them more firmlyinactive.

Although your first reaction might beto blame the schlub with the solderingiron, the Genesis Mishap Board discov-ered that the original design specified thewrong orientation. Everyone who re-viewed the design, and surely their num-ber was legion, evidently assumed theschematic diagram and board stuffingguide were correct.

Circuit-design programs include a log-ical symbol that shows what the part does,a physical symbol that shows what it lookslike, and a list linking logical and physi-

cal pins. A single part may be available inseveral different packages, so you cannotassume a simple relationship between theschematic symbol and the silkscreen im-age on the circuit board.

When you create a new part for a circuit-design program’s library, you draw its log-ical and physical symbols, then create thepin list. Large organizations have depart-ments doing this stuff for a living, but thetask always boils down to somebody read-ing the part’s datasheet, drawing pictures,and copying pin numbers.

The accelerometer switch in Figure 2is about as simple a part as can be: twoleads on a can. Inside the hermeticallysealed can is a weight riding on a care-fully calibrated spring (that’s my guess,anyway). Unlike a resistor, however, thecan is not symmetric: orientation matters!The datasheet surely describes whichway the acceleration must be applied toactivate the switch, but keeping accel-eration, deceleration, board orientation,and trajectory all correctly aligned is achallenge.

If I were creating that library part, I’dwant to attach an ohmmeter to a realswitch and give it a few vigorous shakesto be sure I knew which end was sup-posed to be up. Maybe that happened andsomething else went wrong? Maybe theboard orientation changed after the firstlayout?

What has not been revealed so far ishow the error managed to pass all the testspresumably applied to the overall assem-bly. I find it hard to believe that nobodyapplied the appropriate acceleration to thecomplete sample return capsule to acti-vate the switches, but so it goes.

Errors of OmissionThe eminently successful Cassini-Huygensmission to Saturn provides an agonizingdemonstration of how things go wrongdespite the most rigorous checking andverification procedures known to mankind.What’s really painful is that a second er-ror very nearly negated an amazing fix forthe first error.

Reliability: The Hard and the SoftEd Nisley

E M B E D D E D S P A C E

Ed’s an EE, PE, and author in Pough-keepsie, NY. Contact him at [email protected] with “Dr Dobbs” in the subject toavoid spam filters.


The mission plan called for the Huy-gens probe to transmit data to Cassinithrough two separate radio channels whileentering Titan’s atmosphere and landingon its surface. Because the probe decel-erates relative to the spacecraft, the radiosignals are Doppler-shifted: Cassini ap-proaches Titan at about 5.5 km/s as Huy-gens floats downward under its parachute.

The RF receiver and data demodulatorcircuitry had been used successfully inEarth-orbiting satellites, so they were well-understood designs with a good trackrecord. The Italian firm that provided thepackage refused to document the circuit-ry, asserting that it contained proprietaryIP, and NASA and JPL accepted what wasliterally a black-box subsystem, acting un-der the reasonable assumption that itwould work as required.

System tests throughout the decade-long process of designing and buildingthe craft showed no problems. As JamesOberg described the situation in IEE Spec-trum, “However, a proposal for a so-calledfull-up high-fidelity test of the radio linkbetween the probes (where every systemis subjected to a simulation of the exactsignals and conditions it will experience

during flight) had been rejected becauseit would have required disassembly ofsome of the communications compo-nents…The reassembled spacecraft wouldthen have had to undergo exhaustive andexpensive recertification.”

Cassini-Huygens launched in October1997 and spent the next seven years exe-cuting precise gravity-boosted flybys ofEarth, Venus, and Jupiter on the way toSaturn. Ah, if only we had the propulsionof any third-rate science-fiction rocket ship!

Although Cassini’s radio receivers couldhandle the expected 38-kHz Doppler shift,which is a more-or-less straightforwardmatter of bandwidth in the RF chain, thedemodulator converting the received sig-nal into data bits was designed for a nom-inal 8192 b/s data rate. A relative veloci-ty not only Doppler-shifts the RF carrierfrequency, it also affects the effective datarate and, alas, a 5.5-km/s speed differen-tial puts the data rate well outside the de-modulator’s limits. The original missionplan would have methodically choppedthe data stream to garbage.

Oberg tells an antiDilbertian story of anengineering manager (!) following a hunch,an engineer designing a complex test andmaking on-the-fly adjustments to verifyanother hunch, carefully documenting theresults and presenting them to a disbe-lieving organization, and eventually con-vincing a recalcitrant bureaucracy thatSomething Really Was Wrong. Fixing thedemodulator would be a Simple Matter ofFirmware if it were in hand, but there wasno way to insert the new code withoutdismantling the probe.

The solution involved orbital mechan-ics: They adjusted Cassini’s Saturn and Ti-tan approach trajectory to reduce thedownrange velocity difference and, thus,

the Doppler shift and the data rate change.As you probably saw in January, the Huy-gens probe worked perfectly.

Half the probe’s image data was lostanyway, because somebody forgot to turnon one of the two receivers. It seems thecommand to activate Channel A wasn’t inthe sequence controlling Cassini duringHuygen’s Titan entry, and the data streamfell on a deaf ear after all.

CBS News quoted David Southwood,director of science for the European SpaceAgency, as saying “We should rememberwe’re human and we should learn lessons,so I will institute an ESA inquiry on howthe command came to be missing.”

High-Reliability LanguageWhat makes those horror stories particu-larly interesting is that they have nothingto do with software. Unlike most of theconspicuous embedded-systems failuresin recent years, the software worked per-fectly. There’s a simple reason for that:The development process that producesaerospace-grade software is designed toprevent errors, so only the truly bizarregoofs remain.

Back on earth, if an error in your codecan drop your customer’s aircraft from thesky, the FAA requires certification to DO-178B. Achieving DO-178B Level A certifi-cation, mandatory for flight-critical software,involves producing literally dozens ofweighty documents that itemize every as-pect of your software-development process.

DO-178B certification applies to the en-tire system, not just a functional block orroutine. In fact, demonstrating that yourhigh-level source code implements thespecifications is necessary, but not suffi-cient, as you must also demonstrate com-plete test coverage on the correspondingassembly- language output — no deadcode, no untested branches, no uninitial-ized variables, no compiler-generatedweirdness, no nothing!

The requirements go further. You mustensure both temporal and spatial behav-ior by demonstrating that your code can-not stall in an infinite loop, that it will re-spond within the specified time limits, andthat it cannot scribble over itself or any-thing else. You must do this, not by hand-waving arguments, but by evaluating thecode: routine by routine, line by line, eventhe library functions and compiler boiler-plate that you didn’t write.

Pop Quiz: List all the C++ compilersthat produced output easily certified toDO-178B. Write legibly.

The requirements go even further. Yourtest cases must stimulate the entire as-builtprogram; you may use unit test cases dur-ing development, but they’re not accept-able for certification. Each test case mustrelate to a specific system-level functional


Figure 2: All four accelerometerswitches in the Genesis sample returncapsule remained inactive duringreentry. The error occurred in thedesign stage and survived everysubsequent review and test. Photocourtesy of NASA.

Figure 1: Despite a rather hard landing, some useful science data is emergingfrom the Genesis mission. Photo courtesy of NASA.

requirement; if the complete set of testsdoesn’t produce the proper coverage, thenyou have a specifications problem.

Dynamic memory allocation? Multi-threaded applications? Asynchronous ex-ceptions? Only if you can prove correctoperation to someone who isn’t ruffledby hand-powered breezes.

Ada compilers have been producingDO-178B certifiable code since their ear-liest days, but the entire cadre of Ada pro-grammers is reaching retirement age withno replacements on tap. C and C++ pro-grammers are cheap and readily available,but the ensuing code presents nightmar-ish certification problems.

Java, on the other hand, is sufficientlydenatured to eliminate most C and C++horrors, while including many high-levelfeatures people have come to depend on.Can Java be adapted for DO-178B certifi-cation?

Yes, indeed, as versions of Java andits underlying JVM designed for high-reliability and real-time embedded sys-tems are now inching toward approvalby Sun and the Java Community. To meetthe requirements for safety-critical DO-178B certification, though, here’s a shortlist of what happens to a language that’salready widely regarded as quite safe…

Dynamic class loading and dynamicmemory allocation Go Away, which alsoeliminates the need for garbage collection.That allows hard, real-time performanceand prevents a host of memory allocationissues.

All temporary data is stored on the stack,so the compiler can guarantee that the pro-gram never dereferences a dangling point-er. You hardly ever hear of the stack in or-dinary Java programming, do you?

As a side effect, those changes elimi-nate 99+ percent of the JDK library rou-tines, paring it down to about 10K more-or-less easily verified lines of code. Nearlyeverything a Java programmer has cometo expect in the JDK won’t be there for asafety-critical application.

Asynchronous control transfers, mean-ing all that exception handling stuff, GoAway. You must figure out how to han-dle errors with provably correct resultsand timings.

You can have separate threads, butwithout time slicing between them. Youmust ensure that overall system timingworks properly regardless of how thethreads play out. Rate Monotonic Schedul-ing is your friend.

Two interesting side effects fall out nat-urally from all that paring: The overallcode footprint drops 99.5 percent to about100 KB, and the performance jumps toabout 90 percent of comparable C pro-jects. For embedded-systems code, thoseare highly desirable attributes.

Now, if your first reaction is to rejectprogramming in such a straitjacket, goback and reread the first part of this col-umn. Would those projects have fared aswell with your code in charge?

If your second reaction is to wonderwhat it would take to get your code upto spec, you’re thinking the right thoughts.Full-throttle DO-178B certification isn’t ap-propriate for most projects, but perhapsit’s time to start rejecting structural com-plexity in favor of simple languages androutines that, well, just get the job done.

Reentry ChecklistRemember that my intent is not to slagNASA or ESA, but to highlight the extremedifficulty of catching that last error. Thoseorganizations have an incredibly good andvery public track record; it’s likely thatwe’d all do better if our track records weresimilarly public, painful though it may be.

Read Leonardo’s track record and beawed: The Notebooks of Leonardo DaVin-ci (Definitive Edition in One Volume) edit-ed by Edward MacCurdy, Konecky & Ko-necky, 2003. Even 500 years later, you’llfeel inadequate.

Watch the Genesis capsule reentry athttp://anon.nasa-global.speedera.net/anon.nasa-global/genesis/genesis.mov. Anote on the Genesis Mishap Board’s pre-

liminary report is at http://discovery.nasa.gov/news_101804.html.

The IEEE Spectrum article describing theCassini receiver demodulator error is athttp://www.spectrum.ieee.org/WEBONLY/publicfeature/oct04/1004titan.html, and theESA version is available at http://www.esa.int/spacecraftops/ESOC-Article-fullArticle_par-40_1103125842574.html.

IEEE Spectrum describes the Cassini re-ceiver command error at http://www.spectrum.ieee.org/WEBONLY/wonews/jan05/0105nhuygens.html and a CBS Newsversion may still be at http://cbsnews.cbs.com/network/news/space/recent.html.

Just read the list of DO-178B’s docu-mentation requirements to show how littleyou’re doing now: http://www.lynuxworks.com/solutions/milaero/do-178b.php3. Be-cause DO-178B describes a set of process-es, rather than prescribing a set of tests, it’sfar more complex than I’ve described here.

Kelvin Nilson of Aonix describes someaspects of real-time Java at http://www.stsc.hill.af.mil/crosstalk/2004/12/index.html.The illegibly small GIF table on this pagecompares ordinary and real- time Java:http://www.rtcmagazine.com/home/printthis.php?id=100111.

DDJ


P R O G R A M M E R ’ S B O O K S H E L F

Computer book sales have been drop-ping steadily for the past few years.This isn’t just because the dot-comboom is over; it also reflects the fact

that most of us can find more things onGoogle, faster than in a pile of thinlysliced trees.

Coincidentally (or perhaps not), fewermassive, poorly edited tomes are landing onmy desk today than in 1998. Instead, I amgetting more books about the “how” of pro-gramming, more tutorials, and more analy-sis. Alistair Cockburn’s Crystal Clear is a goodexample. It describes an agile developmentmethodology aimed at medium-sized teamsworking on a two- to three-month cycle.Cockburn combines a refreshing lack ofhype with real-world stories; more impor-tantly, he repeatedly emphasizes the no-tion that methodologies are signposts, notdestinations, and that what matters is notdotting is and crossing ts, but achieving re-sults. His writing style is rather dry, but af-ter all the shouting and counter-shoutingabout Extreme Programming in the pastfew years, that’s no bad thing.

Johanna Rothman’s Hiring the BestKnowledge Workers, Technies & Nerds isanother “how to” book. As you can guessfrom the title, its focus is how to find andhire programmers and other technical staff.Everything she says is common sense—prioritize requirements, use technicalquizzes to determine whether résuméclaims are overblown, make sure the peo-ple you’re hiring will fit into your exist-ing team— but unless you’ve been mak-ing hiring decisions for several yearsstraight, you’re probably only doing halfof the things she suggests. I would haveliked some discussion of how to handlenondisclosure and intellectual property is-sues during hiring, and a chapter on howto let people go would have been wel-come, too, but those are minor points. Ifyou’ve come to management from a tech-nical background, do yourself a favor: At

$45.00, this book may be one of the bestinvestments you make this year.

Ira and Nate Forman’s Java Reflectionin Action is as “hard” as the previous twobooks are “soft.” Over the course of 10chapters, the authors explore the Java re-flection API, a powerful abstraction layerthat lets you treat classes as objects, andinvoke their methods in flexible, genericways. Every large Java library or applica-tion I know uses reflection to load exten-sions; the more advanced ones use it toconstruct or customize classes on-the-fly,inspect the call stack, and profile perfor-

mance. This book contains the best dis-cussion of Java’s dynamic proxies that I’veever come across, and also has notes onrelevant new features in Java 1.5. It’s notfor the faint of heart, but is required read-ing for anyone who’s serious about mak-ing Java work for them.

Neil Jones and Pavel Pevzner’s Intro-duction to Bioinformatics Algorithms alsoisn’t for the faint of heart, although it doestake programmers’ ignorance of biology,and biologists’ ignorance of computer sci-ence, into account. If you’re a biologist,this book is an all- in-one survey of thecore concepts from theoretical computerscience that are revolutionizing your dis-cipline. If you’re a programmer, it willshow you how what you know about sort-ing, searching, and graph theory can beapplied in DNA sequencing, evolutionarybiology, and other areas. There’s moremath than code, but lots of diagrams andclear explanations make it all digestible.And hey, I’ve always wondered how Bat-man gets dressed in the morning…

Last, but not least, is Hans Langtangen’sPython Scripting for Computational Sci-ence. The book’s aim is to show scientistsand engineers with little formal training inprogramming how Python can make theirlives better. Regular expressions, numeri-cal arrays, persistence, the basics of GUIand web programming, interfacing to C,C++, and Fortran— it’s all here, along withhundreds of short example programs.Some readers may be intimidated by thebook’s weight and the dense page layout,but what really made me blink was that Ididn’t find a single typo or error. It’s agreat achievement, and a great resourcefor anyone doing scientific programming.

And now, a self-serving footnote: WhenI’m not writing for DDJ, I supervise under-graduate programming projects at the Uni-versity of Toronto. By the time you readthis, a description of how those projects arerun will be up at http://pyre.third-bit.com/.I’d be grateful for your comments.

DDJ

Good Books inBunchesGregory V. Wilson

Gregory is a DDJ contributing editor andcan be contacted at [email protected].


Crystal Clear: A Human-PoweredMethodology forSmall Teams

Alistair Cockburn Addison-Wesley, 2004336 pp., $34.95ISBN 0201699478

Java Reflection in Action Ira R. Forman and Nate Forman Manning, 2004273 pp., $44.95ISBN 1932394184

An Introduction to Bioinformatics Algorithms Neil C. Jones and Pavel A. Pevzner MIT Press, 2004435 pp., $55.00ISBN 0262101068

Python Scripting for Computational ScienceHans Petter Langtangen Springer, 2004726 pp., $69.95ISBN 3540435085

Hiring the Best Knowledge Workers, Technies & Nerds: The Secrets & Science of HiringTechnical People Johanna Rothman Dorset House, 2004352 pp., $37.95ISBN 0932633595

Testing Technologies is offering TTwork-bench, a test development and integrationenvironment based on the standardizedtest specification and implementation lan-guage TTCN-3. The environment offersmodules such as a TTCN-3 core languageeditor, a TTCN-3 to Java compiler, and anembedded test-management suite. Fea-tures include context-specific syntax high-lighting, code navigation support, imme-diate validation of test specifications,filtering of logging events, online/offlinelogging, and test data views with analysissupport for validity.

Testing Technologies IST GmbHRosenthaler Strasse 13 10965 Berlin, Germany+49-(0)30-7261919-0http://www.testingtech.de/

The latest version of the NAG Fortran Li-brary includes over 300 new functions,taking the total number of routines inMark 21 to 1500. New functions includea chapter covering mesh generation thatincorporates routines for generating 2Dmeshes together with a number of asso-ciated utility routines. Extensions havebeen included in the areas of zeros ofpolynomials, partial differential equations,eigenvalue problems (LAPACK), sparselinear algebra, and a significant expan-sion of the G05 (Random Number Gen-eration) function.

The Numerical Algorithms Group 1431 Opus Place, Suite 220Downers Grove, IL 60515-1362630-971-2337http://www.nag.co.uk/

Coverity Prevent and Coverity Extend aredesigned to improve software quality—reducing or eliminating computer systemcrashes, security vulnerabilities, and per-formance degradation— by discoveringdefects in software code during develop-ment. Coverity Prevent provides automat-ic analysis while Coverity Extend supportscreation and enforcement of custom cod-

ing rules. Both tools work with sourcecode written in C or C++ for FreeBSD, HP-UX, Linux, Solaris, or Windows. The toolsare an upgrade to the older Coverity SWATsolution.

Coverity Inc.185 Berry Street, Suite 3600San Francisco, CA 94107978-922-3860http://www.coverity.com/

The Carbon Project is providing a freedevelopment toolkit, CarbonTools 2, for.NET developers building interoperablegeospatial solutions. Based on the.NET1.1 Framework, CarbonTools 2 providesan API for accessing geospatial web ser-vices based on Open Geospatial Consor-tium (OGC) specifications: It supports anyOGC Web Map Service (WMS) or WebFeature Service (WFS), and also handlesGeography Markup Language (GML),OGC’s XML encoding for geospatial in-formation.

The Carbon Project47 Westwood StreetBurlington, MA 01803 703-491-9543http://www.thecarbonproject.com/

Jcorporate has updated its ExpressoFramework. The open-source Expresso5.6 release includes Struts Validator inte-gration, Velocity support, and Maven in-tegration. Expresso integrates and buildson many open-source projects includingAnt, Bouncycastle, Cactus, Commons,Log4J, ORO, Oswego Concurrent, JUnit,Struts, Tiles, Velocity, Xalan, and Xerces:more than 65,881 lines of code in 593classes. Jcorporate also offers an Expres-so WebServices component.

JGroup Inc.757 SE 17th Street, #735Fort Lauderdale, FL 33316954-566-0976http://www.jcorporate.com/

A Fixed-Point Toolbox is available fromThe MathWorks, bringing fixed-point de-sign capabilities to MATLAB. The Fixed-Point Toolbox lets you develop fixed-pointalgorithms using bit-true arithmetic andlogical operations with word lengths rang-ing from 2 to 65,535 bits. Paired withSimulink Fixed Point, the Fixed-Point Tool-box can create test benches in MATLABfor bit-true testing of fixed-point Simulinkmodels. The new toolbox also enables in-put and output of fixed-point data typesfrom Simulink models.

The MathWorks Inc.3 Apple Hill DriveNatick, MA 01760-2098508-647-7000http://www.mathworks.com/

MsgCommunicator from AidAim Softwareis a Delphi or C++Builder messenger SDKfor creating custom secure client/serverand peer-to-peer instant messaging ap-plications with data compression, en-cryption, and networking technologies.Starting with Version 1.40, MsgCommuni-cator supports both client/server and peer-to-peer architectures. The tool is availablewith full source code.

AidAim Software LLC555 Vine Avenue, Suite 110Highland Park, IL 60035http://www.aidaim.com/

Aonix has optimized performance in its4.2 version of PERC, a clean room virtu-al machine that was created to managethe complexity of large, dynamic real-timesystems. By improving class-loading tech-nology, Aonix has boosted PERC’s per-formance on representative benchmarksby up to 30 percent. PERC 4.2 also addssupport for the GNU Classpath imple-mentation of AWT graphics libraries, pro-viding a complement to the Eclipse SWTgraphical library support.

Aonix5040 Shoreham Place, Suite 100San Diego, CA 92122858-457-2700http://www.aonix.com/

The Flash Edition of Xamlon Pro lets youwrite Flash applications in XAML, Mi-crosoft’s XML-based markup language,rather than being limited to Macrome-dia’s ActionScript. In addition to offer-ing XAML for simplified development,Xamlon Pro, Flash Edition lets you pro-gram back-end logic with any .NET lan-guage, including Visual Basic.NET, C#,C++, or JScript. Although .NET languagesare used for the application logic, the.NET runtime is not required; deployedapplications only require the Flash run-time on the client.

Xamlon Inc.4275 Executive Square, Suite 525La Jolla, CA 92037815-366-8289http://www.xamlon.com/

DDJ

O F I N T E R E S T


Dr. Dobb’s Software Tools NewsletterWhat’s the fastest way of keeping up with newdeveloper products and version updates? Dr.Dobb’s Software Tools e-mail newsletter, deliveredonce a month to your mailbox. This uniquenewsletter keeps you up-to-date on the latest inSDKs, libraries, components, compilers, and thelike. To sign up now for this free service, go tohttp://www.ddj.com/maillists/.

Though wise men at their end know dark is right, Because their words had forked no lightning they Do not go gentle into that good night.

—Dylan Thomas

Jef Raskin died February 26.Composer, performer, mathematician, painter, model-plane designer, programmer, and critic: Jef was apolymath. But by most of us, he will be remembered for his work on computer and software design,and for his unflagging efforts to make computers work for people, not the other way around.

That was his ambition when, as employee number 31 at Apple (after a stint as a Dr. Dobb’sJournal editor), he started a skunkworks project that he named “Macintosh.” If Jef Raskin’s Macintoshhad been built instead of Steve Job’s Macintosh, it probably would have cost half as much andlooked like an Osborne 1. The driving software probably would have resembled a clever extensionof a text editor more than an operating system. And it definitely would have been designed for(Raskin’s acronym) the “PITS”— the Person In The Street.

That was also his ambition when he licensed a design to Canon and saw it released in the CanonCat, a fine product that could have changed the way we use computers if Canon hadn’t abandonedit. The Canon Cat software didn’t have file names or directories. It didn’t have applications and almostdidn’t have modes. It was so mode-averse that Jef designed it without a power switch.

That was his ambition when he developed Archy, the project that he was trying to finish when hedied. Archy is a nucleus to which commands can be added, in contrast with an operating system towhich applications can be added. It’s an attempt to move on to the next thing after CLIs and GUIs,synthesizing what is best in each into a framework that is both efficient to use and easy to learn. Jef’sson and others are carrying on his work and hope to release Archy this year (seehttp://www.raskincenter.org/).

Jef was a user-interface visionary who had both close and distant vision. He didn’t just dream upnew ways for humans and computers to interact; he worked out detailed specs for his ideas andbacked up his decisions with solid research on how people actually behave. He was capable ofdenouncing the entire Mac/Windows GUI approach, while also laying out precise, scientificallygrounded specs of details of the user interface he wanted to see.

He was also a gifted communicator, as demonstrated in his teaching, his early documentation workat Apple, his book The Human Interface (ACM Press, 2000; ISBN 0201379376), and in inspiring andprovocative essays on his web site.

But so far, his inspiring and provocative words have forked no lightning. This makes me mad— anot inappropriate reaction, I think.

I recall Jef as irascible, sarcastic, opinionated, always ready to defend his view of things, unwillingto suffer fools gladly or to put up with inefficiency or inconsiderate design. His tenure at Apple wascut short by his unwillingness to put up with Steve Jobs, and I read somewhere that he onceresigned from a teaching job by serenading a university official from a hot-air balloon.

I don’t see Jef as one to go gentle into that good night.So in that spirit, I want to flame just a little about one issue that Jef fought for: the elimination of

the application software model of software development. I don’t claim that I will be promoting Jef’sviews, precisely, just flaming under his inspiration.

When I was trying to become a writer, one thing that got in my way was the typewriter. A certainlack of manual dexterity made me a really poor typist. The advent of the personal computer withword processing was a wonderful thing for me: The forgiving nature of the word processor allowedme to learn to type modestly well, and to write much more effectively. I really became a writer whenI was able to get rid of my typewriter.

Now I want to get rid of my word processor— and every application program that requires me togo through a tedious launch process to use its capabilities and puts me in its own world where Ihave to remember to express what ought to be reflex actions in its peculiar vernacular.

Applications are today’s typewriters, and I eagerly await the day when they go away.Now wouldn’t that be a nice vindication for Jef.

S W A I N E ’ S F L A M E S

The Dying of the Light

Michael [email protected]


Dr.dobbs.journal.volume.30.Issue.5.Number.372.May.2005 EEn

Documents

cmp media

text classication

core basis

multicore

source code

bayes rule

computer science

core processor