Compound Interest in IT
by
Bernard Ng
(updated Dec 20th 2000)
The magic of compound interest in IT bears a strong resemblance to its effect
on ones personal finances. This similarity first occurred to me after I read
"The Wealthy Barber"
by David Chilton in 1990.
The book clearly describes what most of us already know, that it is better to
earn interest on our savings and investments than live on credit and owe other
people lots of money. The detailed explanation of the mechanics of compound
interest led me to compare it to some frustration I was experiencing at work,
and so I became motivated to write this document to share my hypothesis.
Over the years, I have been observing knowledge workers in the IT industry and
all my observations have reinforced my conviction that there is plenty of truth
in my hypothesis although I believe it is too subjectively complex to
scientifically prove it. There were numerous other sources that contributed to
my thoughts. Most noteworthy are the pearls of wisdom to be found in
Peopleware
by Tom De Marco and Timothy Lister
(
I've summarized this book
).
Many of the phenomena I've observed is clearly explained in this book.
The only other material deserving explicit credit is what I learned attending
the "Managing Your Time" course at Sun University.
I found it interesting to extrapolate organizational effectiveness from
individual effectiveness when operating in each of the 4 time quardrants
(1- important & urgent,
2- urgent but unimportant,
3- important but not urgent,
4- neither urgent nor important).
It was only in 1997 that I felt compelled to share this personal relevation so
I scribbled the first draft during a trans-Pacific flight. I will keep updating
this document with more case studies if I believe they add some value.
If you are thoroughly familiar with the magic of compound interest applied to
personal finance, then skip to section 2 on the
Nature of IT Work
. If you possess deep insight on the IT industry, you may opt to familiarize
yourself with 3 variables I use (E, W & K) in the 1st paragraph in section 2,
then jump on to section 3 on Compound Interest in IT Work
.
Let's start with a simple example: If you deposit $1000 per month into a
investment account that returns a modest 6% interest compounded monthly, it
would take you 30 years to reach $1 million dollars. You actually deposit a
total of $360,000 (30 X 12 * $1000) which means that $640,000 is interest that
the financial institution pays you. After 11 years and 8 months, the interest
gained overtakes the amount you are depositing. Ignoring the effects of
inflation, many of us would be happy to have started such an account some years
ago. Now for a more impressive example: If you invested $12,000 per year in the
S&P 500 Index, which returns 10.5% per year, it would take you just 23 years to
cross the same $1 million mark!
Even factoring in painful realities such as income tax, having $1 million in
financial instruments that match the performance of the S&P 500 give you the
comfort of having $60,000 to $70,000 per year to live on, even if you don't
lift another finger to work for the rest of your life.
Conversely speaking, if you bought a car for $12,000 but decided to float it
for 5 years on your credit card which charges the typical 12.9% compounded
monthly, you'd find yourself owing $23,000 (almost double) 5 years later.
Instant gratification comes at a high price. Credit card companies love it when
you just make the minimum payment each month, if the amount you owe is large
enough, the total amount you owe will grow larger and larger each month. Taking
the scenario of owing a credit card company $23,000 for a car or any other
purchase that you couldn't really afford in the first place, you can make
payments of about $250 per month ($3,000 per year) for the rest of your life
and still die owing them $23,000!
You can convince yourself of the magic of compound interest by using a
spreadsheet to vary the scenarios. My own experience with looking at the
numbers is that the result is not always intuitive, but it is usually
surprising. There seems to be some magic that accelerates the growth of what
starts off as a small number into a much bigger number. So anybody with half a
brain will conclude that they would like this magic to be working for them, and
certainly not against them. Those who have built a fortune certainly understand
this and use it to work for them.
Sadly, this seemingly simple phenomenon is not understood by the masses. It is
used to benefit even fewer. Most people whom you'd think should be in good
financial health but aren't are victims of compound interest acting against
them. I have more friends and family who are such victims than I want to count.
So if my hypothesis is plausible and this really has a parallel in IT, IT
workers and organizations should also take heed and let the magic work for
them, or at least not against them.
IT work is inherently knowledge-based. All other things (like demeanour) being
equal, a person's effectiveness (E) varies as a function (f) of the work (W)
they have been assigned, and the knowledge (K) they currently possess to tackle
that work. I've grouped theoretical knowledge and practical experience into K.
E = f(W, K)
Don't misunderstand how much emphasis I place on an individual's ability to get
along with others and exhibit great teamwork, I'm just trying to keep this
discussion simple enough to be useful.
I'll limit myself to basic formulae since the strength of my argument is
directly proportional to how much experience you have in the IT industry
anyway. Regardless of your experience, some things should be quite clear:
- If K is near zero, almost any reasonable W will result in low E.
People with very little knowledge (even indirectly) applicable to W cannot
contribute effectively because they have to learn from others, from mistakes or
from some form of documentation. This is termed OJT (on the job training).
- The lower bound of knowledge (Klb) is not zero! It is some
negative value limited by how misinformed this person has been. For example, if
you need to drive to San Jose from San Francisco Airport and K represents the
relative positions of the two cities, K = 0 when you have no idea where San
Jose is. K is negative if you think San Jose is North of San Francisco. The
exact value depends on how far beyond the Golden Gate Bridge you drive before
you realize you are going the wrong way. The old saying that
"A little knowledge is a dangerous thing" applies here.
- The lower bound of effectiveness (Elb) is not zero! It is some
negative value limited by how much access the person has to systems and other
people. A person with low enough K (or sheer ignorance) under intense work
pressure to complete W can potentially inflict severe harm on the health of an
IT system or project, and lower the effectiveness of others at the same time.
- For any fixed target of W, there is some minimum knowledge (Kmin)
required. This either existed before the project, is acquired during the
project whilst completing W, or is never acquired as W is never completed.
Kmin for any W corresponds to Emin. To simplify the
analysis, it is safe to assume that Emin is at some arbitrary
industry average, and as negative as I am about the industry, it is probably
somewhere above zero. If I really wanted to be controversial, I could cite
several references suggesting Emin is actually negative. There can
be no concensus due to the subjectivity of the topic. A good day at work for
me constitutes doing something well or learning how to do something well. I
know many people (all to remain anonymous), where good days at work mean they
get to keep their jobs. The IT industry is so immature that the chains of
incompetence are pervasive.
- If K < Kmin, as Kmin - K starts from zero and
increases, E gets further below Emin non-linearly, I conjecture that
it accelerates until it becomes negative then decelerates and becomes
asymptotic to Elb. With the current pace of technological
advancement, although missing some required knowledge is commonplace enough, it
can be overcome via OJT. In fact some ambitious people insist their work be
challenging enough that Kmin is above the K they currently possess
because they want to learn something on every project. The point is that the
leap must not be too great as K << Kmin is a recipe for certain
disaster unless this person only plays a bit part in the project and is
primarily there to increase his or her own K in the hope of contributing at a
later stage or a future project, or W is research work.
- If K > Kmin, as K - Kmin starts from zero and
increases, E gets further above Emin non-linearly as well. I believe
this curve is gradual, E increases slowly and is asymptotic to some optimal
value where having any more knowledge or experience no longer contributes
because the knowledge is only distantly relevant. The point is that
knowledgeable and experienced people are the minority because of the large
amount of information out there, and the pace at which it changes. It takes a
significant amount of effort to excel in any area, and constant attention to
stay up-to-date.
In teams, projects or large organizations, the above relationships between K,
W and E is further compounded by the knowledge (Kother) other people
accessible to the individual possess. I won't attempt to formulate anything
here but it suffices to state that a person's low K can lower Eother
and a person's high K can raise Eother. Your own experience probably
bears this out: Projects where the collective knowledge (Kcol)
is greater than Kcol-min, and there aren't too many people (since
communication costs don't scale well), have a decent chance of succeeding.
Projects where no team member possesses Kmin are doomed to fail
very badly.
Any unit of IT work (W) presumably has a benefit to it, this benefit may not be
easily quantifiable but it should either:
- Increase the business revenue stream without a disproportionately large
increase in cost.
- Lower the cost of doing business without adversely affecting revenue.
- Be a prerequisite activity that enables or promotes one of the above in the
future.
Any IT industry veteran (or staunch
Dilbert
fan) will tell you that many projects do none of the above. But let's restrict
our discussion to the ideal world where we can assign a positive dollar value
to the long term contribution of W to the company.
Even in your wildest open-source dreams, IT workers do not work for free. The
effectiveness (E) of an IT worker on the project, alternatively termed ROI
(return on investment) or cost-effectiveness, can also be assigned a monetary
value.
Examining the simple case of a one-person project, E is positive when W exceeds
the wage (salary & benefits) and material costs of achieving the project goals.
(Note that a positive E can still be unnacceptable, especially when E <
Emin.)
I've encapsulated all the complexity of time, morale, etc into the function f()
because those factors are not central to our discussion. They are given
separate treatment in section 5 on
Recurring Themes
.
The adage, "Knowledge is Power" is as true in IT as any other business.
Knowledge (K) is the pivotal element in the IT cost-effectiveness
formula:
E = f(W, K)
A unit of work (W), like setting up a web storefront to sell cameras, where you
make a projection that the new storefront will increase your quarterly sale of
camera units by some quantity, is relatively quantifiable. K is impossible to
quantify on its own. For example, if a subset of your K is your absolute
mastery of the COBOL language and development environment, you could have made
a small fortune during the Y2K panic but will find it much harder to locate
lucrative projects requiring those skills now. In that sense, K cannot be
measured in absolute dollar terms but are much closer in nature to frequent
flyer miles. They are acquired through various activites like reading books and
magazines, attending training and conferences, benchmarking, experimentation,
prototyping, and in the course of doing technical work like administration,
design, development and troubleshooting. Like frequent flyer miles, K also has
the notion of expiry although it resembles a half-life function more than a
binary one. If you want to carry the analogy further, this gradual obsolescence
of knowledge is similar to cost of living inflation.
K differs from mileage points in that using them on one project
doesn't reduce their quantity for the next project. Herein lies the effect of
compound interest in IT. If you've worked hard to acquire a high K level, your
E returns for a typical unit of W will be high. In real terms, this translates
to finishing the work on target with satisfactory quality such that you have
time and/or justification to assign yourself to other K-increasing activities.
I've observed and been fortunate to experience this multiplier effect that
closely resembles a positive caseflow situation. Once K breaks out of the
Kmin orbit, one uses up less energy to increase it. Or in simpler
language, once you've attained a critical mass of knowledge that allows you to
exceed your job requirements comfortably, you tend to be more motivated, have
more time, and have to use less energy to acquire more knowledge. Here is a
more detailed enumeration of the reasons behind this effect:
- Although the industry as a whole has a terrible reputation for
underestimating the time and effort required for projects, there is still
some 1st level approximation concensus out there for any W.
- If your K level is far above average, you can meet or exceed expectations
and still reserve time for K-building activities.
- High K levels are unfortunately not the norm in the IT industry where
demand for a skilled workforce far outstrips the supply. As such, if you've
attained a high K level in a particular area, you probably understand its value
and are more motivated to retain that advantage.
- Although the barrier to entry is relatively low (compared to medicine or
law for example), it takes a fair amount of time to build up the experience
(the facet of K as important as theoretical knowledge) necessary to tackle the
complex information tools and systems in use today.
- Once you've demonstrated a high K level to your peers, they tend to
approach you for critique or to help them out of an impasse. This may consume
your precious bandwidth but also offers a wide variety of scenarios for you to
learn from without having to be fully engaged in those projects. The secret
here is to reprimand those who can't be bothered to RTFM, so that the quality
(complexity) of problems brought to you is raised, and most of them turn out to
be win-win situations because you also learn something while helping them.
- Many areas of K are prerequisites to other areas, thus a high K level
facilitates building it up further.
- Knowledge in diverse areas of technology may give you insight to design
patterns which make it much easier for you to pick new K in an area that uses
the same patterns.
- It is common knowledge that there are many ways to solve any problem.
The humorous saying, "If the only tool you know is the hammer, every problem
looks like a nail." is a different way of saying this. With a high K level,
you will be equipped with alternative approaches to various problems, and
choosing the most appropriate one will increase the quality of your work.
- A strange phenomenon I've experienced even when tackling W that demands
Kmin exceeding the K which I possess is that with my K level being
high enough, I have a calm confidence that allows me to chip away at the
problem without being fazed.
Even in IT, the alternative to positive cashflow is deficit spending. It is a
very unpleasant predicament that I found myself in during the early years of my
career and have resolved never to subject myself to it again. Nowadays, I find
it disconcerting just to watch others flounder in that sorry state.
Nevertheless, I list the reasons behind this negative force since an
understanding of it should help others overcome it.
- The hardest part of any project is the start. There is enough natural
inertia in getting a project underway. This inertia increases when K <
Kmin.
- There is good reason why they say "Hindsight is 20/20". When you
are missing some of the K required to complete W, it may not always be
obvious what exactly that K is, and how to go about acquiring it. There is
always the risk of picking up superflous K (although it may benefit a future
project) before homing in on the relevant K. This reduces E when the delay
becomes significant.
- If there are people with high K accessible to you, you might be disturbing
them with really trivial RTFM-class questions. This not only lowers their E on
whatever else they are working on, but may decrease your accesibility to them
in the future when you really need it.
- Trying to acquire missing K under schedule pressure is counter-productive.
It has been scientifically proven that humans are less productive when they
experience low self-esteem, and when they are under the emotional stress that
they comes from not seeing light at the end of the tunnel.
- Unlike constructing a building, where badly tilted structural beams are
quite obvious even to the untrained eye, software is so intractable that severe
deficiencies may not be obvious until a system goes into pilot or production.
Even the addition of ultra-high K members to a team at such a late stage may
not be enough to salvage a project. At best, a lot of effort is already wasted
and W is still delivered but with much lower E.
- Workers with deficient K are prone to making mistakes. I have not found
any correlation between how low their K levels were and the magnitudes of the
mistakes, only between how much they were responsible for and how much they
managed to screw up. Some mistakes have far-reaching effects and can result in
many other people (regardless of K level) scrambling around to try to resolve
the problem. My only advice is to keep low-K personnel off mission-critical
systems and projects. Either invest in them to upgrade their K levels, or get
rid of them. Since there are hardly ever any one-person projects or systems,
this negative multiplier is perhaps the single most significant contributor to
compound interest working against you in IT.
OK, so far all I've shared with you is theory. How do theories come about in
the first place? Either someone is a genius who can simulate extremely complex
natural occurrences within their minds, or has made some observations through
their own experience and notices some patterns. The latter is true in my case.
Like I mentioned earlier, it's impractical for me to scientifically prove this
hypothesis, but I hope my experience will stimulate reflection on your own
experiences and lead you to the same conclusions. Whether that happens or not,
I'd be happy to hear about
your experiences and conclusions
.
I've deliberately omitted all names of people and projects to protect the
guilty but have been quite liberal with providing lots of other details.
If you are among the anonymous guilty and are sure you can explain how the bad
things that happened on your project were unavoidable, I will promptly remove
it from this document. If I find your excuses lame, I will name you and the
project. This is to reduce the amount of time I waste on useless activities.
My detailing of case studies are not meant as personal attacks, I would use
full names, dates and places if they were. Rather, they are meant to
illustrate the good and bad effects of compound interest in the IT workplace.
The case studies are grouped by the activity they were observed in just to
demonstrate the ubiquity of this effect.
If you don't have time to look at the 25+ case studies which I've painstakingly
detailed, you can jump ahead to section 5 where I've extracted some
Recurring Themes
.
4.1 Designing
|
Low K levels beget low K levels.
|
I once worked for a development manager that
knew enough to get her job done (K > Kmin) but not too much more.
As a result, she was wary of a few of us in the group who were passionate about
programming and willing to try out new things just to see how they worked.
There was a project which a coworker (also a friend) and I thought was highly
appropriate for implementation in C++, but since our manager only knew C, she
cited instability in the new technology as an excuse not to even try it.
Fortunately, my friend also had more stamina than my boss, and she conceded the
point after they debated one evening for a few hours. Humans are susceptible to
feeling insecure. A supposedly technical manager with low K lowers the group E.
|
Lower-level platform K is a good thing.
|
The above-mentioned manager assigned me to tune an application she had written
because its run-time of 11 hours was impacting factory production hours. It was
an assortment of C programs and shell scripts that filtered through a dump file
of our bill-of-materials (BOM) to extract the relevant part hierarchies for the
various shopfloor databases. When I mentioned trying to use the relatively new
mmap() call (memory-mapping) to speed up file I/O, she vehemently opposed using
any 'fancy' techniques. She said all I needed to do was to reduce the run-time
down below 7 hours. I avoided further discussion, used mmap() and pointer
swizzling instead of staying with the inefficient method of continually forking
filter programs from a script, and cut the run-time down to 4 minutes.
Evidently, my solution was scalable enough since my extract program remained in
production for 9 years until we replaced the entire shopfloor system. I'd love
to gloat about what happened to that manager but that is irrelevant to this
discussion. Fortunately, my earlier 2 managers had high K levels, and
encouraged me to master the many capabilities of our powerful OS. The software
we typically manipulate is built upon layers and layers of other software,
building up high K levels in the lower platform layers can only help.
|
Better to have less high-quality K, than more low-quality K.
|
Sometimes, when a person's K doesn't rest on a strong foundation, or has never
been tested in a real production environment, the K becomes a liability. My
peer technical lead and I were flown to another continent by the project
manager in the hope that we would be convinced to use technology developed by
his 'blue-eyed boy' from their time together in another company. We patiently
listened to a proposal for a communications infrastructure for our new factory
test system, and were appalled to hear that the messaging was based on a
non-terminating broadcast scheme. In other words, messages would traverse
LAN segment boundaries and be propagated until there were no more customized
router daemons at the ends of a 'logical ring', even if the intended recipient
was residing on the same host! We promptly shot the proposal down. Some
principal engineers are worth less than a fresh graduate.
4.2 Experimenting
|
Academic qualifications are no guarantee of anything.
|
We had a PhD candidate from MIT in a technology group I belonged to. He is a
highly intelligent person who had a strong conceptual grasp on technological
issues, and a nice person as well. I didn't notice until later that he avoided
working on any experimental project alone. In retrospect, we realized that we
did all the programming on all the projects he was on, and that he's hardly
capable of stringing a 5-line script together. He moved on (for a nice raise)
to Xerox XSoft, which was probably a better fit since they hardly produced
anything either (OK, low blow). His one unforgettably wonderful deed was to
bring in Jim Coplien, an ex-colleague of his, from Bell Labs to instruct us on
OO and C++. I still track Jim's work on patterns.
|
The lower bound of E is a function of the people and systems accessible by
the person with low K.
|
Reinforcing the previous point, my boss hired someone with a masters degree in
Computer Science (MSCS) to strengthen my team.
I was leading a contingency effort to port our existing mission-critical test
system to a new OS because new workstations did not have sufficient diagnostics
on the old OS.
The new-hire was obviously quite a ways behind as she is best remembered for
lifting up an RS232 connector and asking another colleague, "Ethernet?".
Her limitations soon became obvious so my boss assigned her the development of
some 'harmless' utility script while 3 of us plodded away on the main problem.
One day, she comes frantically running in my lab asking if anyone knows what
happened to her script. To our dismay, we discovered that her script had blown
the entire filesystem away, taking down itself as well as 2 days worth of our
prototyping work. Call me an extremist but I believe that low K personnel
should be isolated to separate systems and networks, firewalled if possible.
4.3 Implementing
|
Some K is highly applicable to areas they were not intended for.
|
I took a college course in "VLSI Design" that involved the use of a multitude
of CAD tools, most of them were stages in the automatic generation of a chip
from logic specifications. It was bad enough that 50 students had to share
12 workstations, most of the students I observed were typing in long
incantations of piped commands to get their work done. The more software-savvy
were embedding these commands into scripts. Thanks to having worked with people
during my internship at Sun that pointed me to source control and automatic
builds, I configured the dependencies in the pipelined stages with relative
ease and all my project team had to type after that was "make".
|
K level may not be proportional to how vocal a person is.
|
On a 15-person, cross-geo team designing a new test system, there was a
particularly vocal person who took pleasure in excessive demonstration of his K
level (he had an MSCS of course). He used a condescending tone when making
presentations and challenged design decisions frequently. It would have been
tolerable except he was wrong most of the time. After wasting time proving him
wrong in a few agitated public debates, I forced the project manager to remove
him from the project to avoid further slowdown. People with K <<
Kmin should be booted off projects unless they have a nice
personality.
|
Low quality K can be worse than no K.
|
On the topic of personality, a really nice coworker (with an MSCS) on the above
project was tasked to write a resource manager daemon. Unfortunately, he was
only superficially trained in OO and coding for performance. While reading a
database table with R rows and C columns, instead of speeding up access by
batching up the reads, he hit the database R x C times! The high-level design
had already passed through design review, but schedule pressure prevented us
from performing a code review until he had amassed 15,000 lines of molasses.
Another high K colleague and I subsequently wrote an infinitely faster,
fault-tolerant version in less than 4,000 lines. It was the first time I had
worked with this person, and I'd gladly work with this person again because he
is a genuinely nice person, but the complexity of the job (Kmin)
must be within reach of his current K level.
|
You only need enough K to leverage off the K that others have.
|
I began using Perl around 1990 to automate system administration tasks. I
started using it again in the last few years for CGI processing. I am by no
means a Perl expert but just knowing how to use it, and of the existence of
the CGI module (CGI.pm) allowed me to accomplish my tasks very efficiently.
K does not need to be very deep for you to be very efficient. You still need
to have enough K to know which little K to acquire to leverage off others.
4.4 Testing
|
Safety in numbers: High K coworkers give you the courage to follow your
convictions.
|
From the military point of view, I don't believe in taking the chance that I've
advanced so fast that enemies are left behind my lines. You don't need to have
served in the army to sense that an enemy behind you is a lot more dangerous
than one in front. (The notable exception being MacArthur's leapfrogging of
immobilised Japanese troops in the final stages of the Pacific War). Being a
programmer since 1978 (high school), I have always had a deep sense that I am
more productive when I test the heck out of a freshly written block of code
than amassing tons of code before I even run it once. The rationale behind
incremental, continual testing is very sound: Context switching, an expensive
activity for operating systems, is much more expensive for humans. When you try
to locate and fix a problem with something you've done a few minutes ago, it
will take you a lot less time than if you try to fix it a few days later.
The comparative cost goes up with code complexity and duration of elapsed time,
peaking with the scenario of someone fixing code so long after writing it that
it might as well be someone else's code. Unfortunately, typical IT management
seems to reward people for hitting deadlines and reaching milestones, not for
the quality of their work. Since one needs K to recognize quality, it is a lot
easier for the typical manager, who can't actually do what his people do, to
just measure quantity. This brings about an individual contributor's quest for
'pseudo productivity', just churning out as many lines or modules as he or she
can get away with in the shortest amount of time. Or better yet, independent of
actual output, being able to report as many line items of apparent progress on
a status report as possible. I have an excellent ex-colleague and friend to
thank for giving me the moral courage to depart from this norm. His code is so
well surrounded by unit and integration tests that it inspired me to do the
same for mine. He wrote the first public domain C++ wrappers for XView, the
RPC-based ToolTalk database server, and finally made it big after 6 start-ups.
He himself was influenced by a friend who is an expert in operating systems and
an author of a prominent multithreaded programming book. And I believe that
person was mentored by the guy who invented the Self language and implemented
early Smalltalk environments. My point is that many passionate novices possess
the right instincts for high E in IT work. When surrounded by high K coworkers,
their potential is maximized because they acquire the courage to do what is
right even when politically incorrect. And an obvious compounding effect takes
place because it costs a lot less to get it right earlier then come back to try
to fix things later on.
|
Slowing down in order to go faster.
|
How many IT projects do you know come in on time or early? Having had strong
influence from high K coworkers as described above, I developed enough K myself
to implement 2 unconventional ideas in a large project: a throwaway prototype
and mandatory regression tests in all modules. We were building a test system
that required a persistent messaging system. The old system was built on RPC
(transient messaging) so there were a lot of unknowns for us to handle. With
high K coworkers supporting my idea, we spent 3 weeks on a throwaway prototype
(unthinkable for a typical manager to swallow) and got all the important
questions on our list (including scalability) answered. The additional time
required to write regression tests and harnesses was even more significant.
By my estimate, it may have added more than 6 man-months to the 150 man-month
project. But I know that we would not have completed the implementation early
without the rigor and discipline. We used the remaining time to test the heck
out of it at system level and went into production on time. When I first joined
the group supporting the old system, I got paged twice a week on average. After
the new system went live, I probably got paged twice since.
|
Chicken or the egg? Test or the code?
|
The second question is actually a rhetorical one. After interacting with many
high K people at OOPSLA conferences, I've become convinced that it is
preferable to write a regression test harness and some basic tests BEFORE
writing code whenever possible. For example, if I'm developing an XSLT to
transform one document type into another, the first thing I'd do nowadays is to
write or find a few examples of correct corresponding source and target
documents so I immediately know when my work is done. An advantage over the
traditional sequence of writing the code first is that people who write the
code first will not be able to resist the temptation of running source
documents through it to examine the output. Depending on the type of
application, one can be more susceptible to accepting erroneous output as
correct, delaying the discovery of the problem to a later date or never. I've
internalized this now, thanks to the K levels of the people in the XP (Extreme
Programming) movement. An illustration of how their K level has compounded my E
as well as their own.
4.5 Deploying
|
Spend $1 on a credit card and owe $1.1 million in 78 years.
|
Ignoring late charges which would further accelerate the shock, and suspending
the reality that nobody would give you such a high credit limit, the above
scenario is numerically possible. Such are the parallels in IT where one tiny
mistake can end up costing a company orders of magnitude more than the cost of
avoiding the mistake in the first place. As a student intern, one of the first
projects I was assigned was to determine how much test data collection we were
losing over several months, and root-cause why it was happening. Coming from
Berkeley where Ingres originated, I was able to hit the floor running and
immediately modify the data entry program to log all test data collected into a
flat file as well as to the Ingres backend. Comparing the daily totals, I
reported that we were missing between 2% to 10% of the entries in the database.
I analyzed the timestamps from the multiple logs and realized that the missing
entries fell into a regular pattern: more data was lost when multiple factory
staff were data entering simultaneously. Picking up a DBA manual for the first
time ever, I realized that the cron job my low K coworker wrote restarted
Ingres in single-user mode, meaning it didn't bother to enable locking and
multiple clients were clobbering each others entries. The version of Ingres
we used appended rows to each table file one disk block at a time for
efficiency. Without the multi-user mode protection of serializing row
insertions from different clients, each client assumed it had solitary control
over appending to every table. That meant the last client to write a row that
filled up a partial block overwrote any other client that was assigned the same
starting block address. If the culprit had invested 2 hours to RTFM, we would
have avoided 400 man-hours in meetings, investigation and troubleshooting. Not
to mention that lots of data would not be lost forever.
|
Low K and high speed, a deadly combination.
|
There was a system administrator in manufacturing support who didn't like being
told what to do. When I used to advice him how to take precautions and remind
him that the tiniest mistake of bringing the shopfloor down would cost us $4
million in revenue per hour, he repeatedly chanted out, "I know, I know. No
problem." He was actually a dilligent person, always in a hurry to get things
done, but he had a big problem admitting he didn't know anything. One day my
pager went off and I was summoned into an emergency meeting to find out why
production went down as a result of all 3 dataservers rebooting. I knew it was
statistically close to impossible that 3 servers could experience disk failure
within 2 minutes of each other and volunteered to investigate the situation.
Nobody admitted to any fault so it smelled very fishy to me. Pulling the server
room access logs from security, it turned out that the sysadmin entered the
room 5 minutes before 1 dataserver went down and 7 minutes before the other 2.
A drive did fail on 1 dataserver (not a problem as production would go on) but
after he halted the server to replace the drive, he walked to the rear of the
aisle of servers and turned the power off the rack that contained the other 2
dataservers. I'll leave what happened next to your imagination. I don't know
what became of this person but if he had enough K then, he would be confident
enough to admit lack of K and honest mistakes, and realize that it is OK not to
know a lot of things. And that could have prevented him from making many small
mistakes as well as that very memorable one. For myself, the more K I acquire,
the more K I realize I am missing. The important thing is to have enough K to
make steady progress without lowering everybody else's E.
|
A little knowledge is truly a dangerous thing.
|
When I was an escalation engineer, our team was requested to help customers
recover lost data on many occassions. One particular case I was assigned had a
customer sysadmin who insisted that our Veritas RAID solution was buggy and
demanded we help recover their data. Upon investigation, I discovered that they
had configured 5 plexus into a RAID-0 (striped) metadisk. I was horrified and
asked if they had done a backup, they said they had but they were wondering why
the hot spare didn't kick in for automatic recovery. I then gave them the bad
news that recovering from failed drives was only possible with RAID-0+1
(striping & mirroring) which required another 5 drives, or RAID-5 (which may
offer slower performance). Their sysadmin was worst than clueless, he knew
enough to lure them into a false sense of security. As you can guess, their
backup recovery didn't work either.
|
Two heads better than one? Twenty can be worse then one if all are low K.
|
A similar case as above involved a telco complaining about the quality of the
drives we supplied them. They complained that the consequences of our inferior
quality product was causing them excessive downtime. This was a strategic
customer who were data mining from a few hundred gigabytes of phone calls, and
it was political in nature since the group championing our platform were under
heavy attack from other groups in the telco using competiting platforms. I was
instructed that I should drop everything else to concentrate on this 'Severity
1' case. After our account manager gave me the case history and I scoured
through the system and database configuration, my response was a nonchalant,
"What do they expect?" In a nutshell, their architect (or lame excuse for one)
had utilized almost 200 drives but had it configued such that any drive failure
would bring down the entire system. At a big meeting where their entire team
was in attendance with some development consultants, 2 sales people and 2
engineers representing my company, I explained that all hardware devices had a
Mean Time Between Failure (MTBF) rating which could be used to predict system
availability. I was thoroughly familiar with the concept having come from the
factory where we housed our own MTBF Lab to keep our parts suppliers honest.
Based on their configuration, we could expect a drive to fail every 3 to 4
weeks, which was close to what was happening. I don't usually seek to persecute
people but this was a kill or be killed situation. So with my company's
interest at heart, I had to discredit their architect in front of their
management and recommend that they add redundancy in with a ton more drives.
|
Low K personnel should not be allowed access to production systems
without close supervision.
|
When coworkers demonstrate low K levels, we should isolate them in lab
environments to do their work, then check it thoroughly before letting them
release it to a production environment. That would have averted this nightmare
I am about to describe. This person was assigned to help on the case management
project under supervision because she was taken off another development effort
that required a lot more initiative and K. She added some new rules to the
alert management module and didn't bother to test it well. Next morning in
another timezone earlier than ours, many engineers had their message pagers
flooded wth alerts to the point many of them were simply turning them off.
Several hours later, the new rules were removed and the changes taken into
isolation for root cause analysis (RCA). The details of the bug are not as
significant as my general observations. Low K workers tend to remain low K
because they have no drive to increase their K. Most of the time, they are not
only technically low K, they tend not to be too aware of the business issues as
well, so they don't realize the importance of the systems. And in line with
that, they also tend not to be thorough enough to be left wandering around
production environments as this case demonstrated.
4.6 Troubleshooting
|
Just enough K to do your job is often way too little to do it well.
|
A colleague in an development and integration group approached me to help
troubleshoot a performance problem they were experiencing on a logistics
application. The client was supposedly run in Japan while the server was in
Singapore. They had raised the issue with the network infrastructure group 2
months earlier but the other group kept telling them that they didn't detect
any network bottlenecks. When I investigated the matter, I was horrified to
find out that the client was actually also running in Singapore but X-displayed
over to the warehouse in Japan. Despite our ATM backbone, the event and display
packets would still have to get back and forth betwen Singapore, Osaka and
Tokyo on ATM, and between Tokyo and Atsugi on leased lines. I immediately
informed my colleague that this situation was not very scalable and poor use of
bandwidth. And even if I helped to solve this problem, I would hold them
accountable if there were future complaints about performance. It turns out
that the person in the network infrastructure group had no idea what sort of
traffic this application generated and conducted ping timings on each network
segment. The default payload size for a ping is 64 bytes, nowhere close to
simulating X traffic. I employed the same ping test on all the segments, but
with a 32K packet size resembling an X screen refresh, and detected severe
packet loss on the leased line segment between Tokyo and Atsugi. A day after
informing a high K friend I had in the network infrastructure group, she
confirmed my observations and fixed the problem by changing the priority
queuing configuration on that segment. Here we have a classic case of the ball
being dropped in no-man's-land. All the application people knew was that the
response rate on the screens was unacceptably slow. All the network
infrastructure people knew was that the network had no problems for 'normal'
traffic. Real world IT environments are extremely complex and getting more so,
there is a need for more people with multi-disciplinary skills. Even high K
levels can be insufficient if it is all concentrated in a narrow field, there
is a need for the K to span many boundaries. Don't fall into the trap of
identifying your core competencies and only focusing effort on them. If there
are other areas of K you are dependent on, someone in your organization better
have the first clue about it.
|
Some inherently complex problems require a lot of K. If effort is not spent
making the necessary K available, more effort will be spent on the problem.
|
Have you ever considered walking into an electronics store and asking to buy a
DVD player that is no longer being sold and doesn't come with a warranty? I
have witnessed the purchase of a large, complex CRM system that makes as much
sense as that. There are always extenuating circumstances for strange decisions
but the jury is still out on the wisdom of this decision. Some cite the
overriding factor as balance of trade, but is anyone actually tracking how much
trade is being balanced? I will not expand the acronyms I use here while this
incident is so recent but enough details illuminate this important lesson.
Despite company S1 merging with S2 and focusing solely on S1's products for the
future, we decided to deploy the ST product from S2 for production use. There
was no real reason why we could not wait for the S2000 product from S1 if we
truly wanted to forge a long term relationship. Since the companies were
merging and ST effectively EOL-ed, all the knowledgeable engineers who worked
on ST left the company. The SCS part of ST was the most problematic as it was
a VisiBroker-based CORBA server written in C++, heavily multithreaded, and had
to run over Bristol Technologies Wind/U as it had been written on Windows NT
synchronization primitives and deployed on Solaris. I kept hearing about how
this server kept crashing or hanging everywhere we deployed it, bringing all
client connections down and forcing our engineers to log in again, losing any
work they had in progress since the last save. Having experience in C++,
VisiBroker, multithreading and troubleshooting such hangs and crashes, I
volunteered to help root-cause the problems despite my ignorance of Wind/U.
My few repeated requests to be provided full Purify out to help them determine
if there were memory-related problems were never satisfied. And my suggestion
to have a full development and debugging environment with complete source code
access must have run into political barriers as it never materialized either.
The good reason why everyone lost interest in the problem was that the
situation became less critical because some workarounds were put into the
client to allow it to reconnect to another server when their server stopped
responding. Perhaps I got involved too late to help fix it permanently. What I
do know from experience is that we should have forced the vendor the put in the
right (high K) resources from the start, or done that ourselves. The hundreds
and thousands of man-hours lost, the credibility from our users lost, and the
delays to the project should have been arrested much earlier on.
|
Missing a little K is sufficient to lower E by a lot.
|
While attending a conference, I called a coworker just to exchange the latest
news when he told me a website our division was about to launch was having
severe performance problems. This group was using the Trilogy engine to generate
quotes and it was about 60 times too slow. They were trying to ramp it up for
weeks and their go-live date was at stake. My coworker had been roped in due to
his K and reputation for being a problem solver. He first confirmed some of his
observations with me about the use of various Solaris performance measurement
tools. Then he sought my opinion due to my love of multithreading issues. After
5 minutes of detailed symptom description, I concluded that they must be using
the reference version of the JVM without native threads pack installed. That
would effectively limit their use of any powerful server to 1 cpu. 3 days later
when the other group heeded my advice, my theory was proven correct and the new
benchmarks showed a 600% improvement, now they were just 10 times too slow. My
next obvious suggestion of adding CPUs didn't yield much of an improvement so
we were faced with the reality that inefficient design or implementation was
at fault. I suggested using JProbe, OptimizeIt or any other memory profiling
tool to see how hard the garbage collector was working. True enough, samples
taken before and after quote generating showed that many unneccessary objects
were being created and destroyed. If you know JDBC well, you'd agree that
DataSource objects should be Singletons or a limited pool at most. The poor
quality Trilogy code was creating a few of these objects per run and causing
excessive garbage collection. I did not get further involved in this project
but there is an important conclusion we can draw from my earliest encounter.
5 minutes of my time was enough to boost the team past a 600% bottleneck.
Hundreds of man-hours could have been saved if I was involved the instant this
problem was detected. But in all honesty and humility, the K I possessed to be
able to help them was very basic. Any team attempting a project of this scale
should have known those basics beforehand, not learn it at such great cost OJT.
|
A little K can be so dangerous that it takes a lot of K to offset it.
|
I was an escalation engineer assigned a Severity 1 kernel panic case from a
foreign air cargo terminal company. This company had long refused to migrate to
our more stable operating system versions and for strange reasons decided to
keep their mission-critical system running on Solaris 2.3. After questioning
our on-site engineers and customer's sysadmins, checking Explorer (system
configuration and log) output and patch levels, I didn't see anything obvious
so I began the tedious exercise of performing analysis on a 2 GB kernel core
image. Our group had the best tools and full source code access so it was just
a matter of time before I would isolate some pattern. Unfortunately, I observed
some symbols which didn't match the kernel source and totally stumped me. Way
out of my comfort zone, I documented all work up to this stage and escalated
the problem to the kernel CTE (corporate technical escalation) group for expert
treatment. A few days later, a CTE engineer notified me that the NFS module was
definitely from Solaris 2.4. I told our on-site engineer to collect the file
checksum of that kernel module and he confirmed that it was indeed from 2.4.
I promptly closed the case and warned the customer that their system would not
be supported until they performed a complete OS reinstallation. So much
bandwidth was wasted because some idiot knew where to replace kernel modules
but not how dangerous the action is. More empirical evidence for my compound
interest hypothesis. IT complexity is such that there may be perhaps hundreds,
perhaps thousands of different ways to build a working system, but certainly
many millions of ways to screw it up.
4.7 Training
|
With proper foundation, acquiring new K is much easier.
|
In 1995, when Java first exploded onto the scene, I was an escalation engineer
in the fly-and-fix team covering Asia. My boss pulled me into his office and
asked if I was interested to be a Java trainer. He said that I was the only
person in the region available with a strong development background and it
would bring the group visibility. I was aware of the history of Java due to
being hungry for K in general. I paid close attention to the Internet (and
maintained a project website since 1993) and attended TOI (transfer of
information) sessions on topics ranging from new microprocessors, cache
architectures, computers and peripherals to new operating system capabilities,
windowing systems, network protocols, compiler features and development
methodologies. So even though it was a tall order to attend a T3 (train the
trainer) then teach my first class 2 weeks later, I pounced at the opportunity
to immerse myself back into a development related activity. The ramp was not
too steep since I had heavy-duty exposure to multithreading (liblwp &
libthread), network programming (UDP, TCP & RPC), and GUI programming (SunWin,
NeWS, XView, Xlib & CDE Motif), had forays into OO (Smalltalk, C++ &
Objective-C) and was aware of garbage collection (Smalltalk & Lisp). I
didn't sleep much the 2 weeks after T3, testing every ambiguity and corner case
I could think of. People who value K tend to feel a moral obligation not to
pass on bogus K to other people. 2 years, 257 students and 5 Asian cities
later, I terminated my ad-hoc role as the first Java instructor in Asia with
average student feedback of 9.2 (the old range was from 0 to 10). The pace of
the IT industry provides ample opportunities for us to increase our K levels.
I think the frequency of these opportunities is proportionate to our existing K
levels.
|
The acquisition of new K is dependent upon one's level of interest.
|
It is no secret that many people are in the IT industry purely for the money.
As an extension of that, they would naturally seek to 'pad their resumes' with
anything that would increase their market value. I had the displeasure of
helping out two such people in the early Java days when we were short on
instructors. These guys would have failed my certification test and should have
not been allowed to teach but for the education manager seeking to beat his
revenue goals. They would go up in front of a class full of eager students and
read off the slides, then scurry back to my office to get me to answer their
students' questions for them. Some people have no pride, were we grooming Java
instructors or parrots? The relevance of this case study is to show that you
can't force K into a person who is not motivated to truly learn, not matter how
conducive the environment is. It's best to save the resources for people who
are really interested, and talk this other category into non-technical work.
|
An experience is worth a thousand theories.
|
A support engineer based in another country was given a teaching assignment
that I had turned down on principle. His manager continually over-commited his
team to tasks they were not qualified for, and often got my team to help them
out for the company's sake. I was already laden with teaching and consulting
Java on top of my regular escalation queue so I told them I would provide
backline support instead of flying there. The root of the problem was that an
important customer had already been promised the "Multithreaded Programming"
class on-site and nobody in the other team had done it before! It angered me
that many managers had no respect for technical K and assumed they could simply
assign their staff to quickly pick up the K and use it for business.
Invariably, these would be managers who have had little exposure to the type of
technical work IT people do nowadays. At best, they have cruised through their
lives as individual contributors using soft skills and performing primarily
non-technical tasks. Anyway, I prepared this engineer by highlighting the more
difficult parts of the course to focus his study on and encouraged him to spend
the preceeding 10 days completing all the exercises, and asked him to contact
me if he needed clarification. During the first 2 days of the course, he would
call me to clarify student questions on thread lifecycle, synchronization
primitives etc. and it seemed to be going well. But on the 3rd and last day, he
called me in a rather frantic tone stating that all the students programs were
executing in single-threaded mode and they were all totally stumped. He emailed
me some code which looked as though it should have worked, then wandered off
into fantasy land asking if there were kernel bugs or known patches that may
not have been applied on the customer's systems. I questioned him about
detailed output from the programs and what he saw in the /proc filesystem. Then
it dawned upon me that he probably did not know the basics about compiling
multithreaded programs. I asked for his Makefile but he wasn't using one so I
asked for the exact compilation command. They simply forgot to specify "-mt"!
4.8 Consulting
|
Unscrupulous people will take advantage if they think you have low K.
|
We have had a horrendous experience using a contract system from an ISV. It all
stemmed from the mandate to "buy instead of build". Buying instead of building
makes all the sense in the world if you carry it out dilligently but our
software procurement has turned out to be the mother of all nightmares from Elm
Street. The root of this evil is that many managers have the misconception that
it is easier to buy than to build. I suggest that it is much more difficult to
do it well. When you build a system from scratch, your focus is on the business
requirements, corporate architecture and interfaces to other systems. With
enough time and talent, you will successfully build and deploy it. Talent with
high enough K might even deliver a highly supportable system resulting in high
E. But when you buy a system, you get sidetracked to features you may not need,
politics, platform and pricing issues. Often, you may not dig deep enough to
ensure that the extensibility of the product takes care of your esoteric
business needs, requirements that users are not going to compromise. IT then
ends up hacking the product to the point that it is no longer supportable or
upgradable, taking away some important benefits of buying instead of building.
Usually, insufficient attention is payed to interfacing with other existing
systems and the cost and effort is not factored into the equation. And last but
not least, there is not enough emphasis put on conformance to current corporate
architecture such that the benefits of having a corporate IT architecture are
lost because of the mish-mash of very different architectures in existence. The
vicious cycle becomes complete when procurement is not conducted in a very
technical manner because the higher K workers will leave for more
engineering-oriented jobs. Once the ISVs know they are dealing with a lower K
workforce, they will hold your company ransom, charging an arm and a leg to
slightly tweak a possibly deficient system when requested. Our experience with
our contract system not only fits the above description, it was so inefficient
that we had to spend many millions upgrading our networks to carry the load.
My boss assigned me to attend a full day presentation by the same vendor to vet
their next generation system architecture. They had been working with us for a
few years and were quite familiar with our architectural direction. All the
right buzzwords were in there, CORBA interfaces, EJB server, Java APIs etc. But
by the afternoon, I nitpicked enough to get really suspicious about their
implementation and decided to maintain my poker face and go in for the kill.
Chatting casually to their chief architect during a coffee break, I got him to
say what was obviously omitted from all the slides. It was being built over a
Distributed Smalltalk core, something we were not willing to accept. They knew
that and attempted to window dress the new system with buzzwords that were in
line with our company's direction. Although their next generation looked
significantly better than their current (not difficult to), the last thing we
needed was a painful migration to another extremely proprietary platform that
we weren't going to build expertise on, with lots of excessively expensive
consulting necessary to deploy and maintain. With sufficient K, another snow
job averted and millions saved for my company.
|
With proper foundation K, simple mistakes can be avoided.
|
As a result of teaching many students Java programming (not my job), an
important customer made a special request for me to provide them consulting
(also not my job). Another $30K revenue for my company and higher visibility
for my escalation group was hard to turn down for my boss so I was rewarded
for my extra skills with a lot more extra work (that's how it works). The
customer operated the busiest port in the world and was having performance
problems with an applet they had written for cargo manifest manipulation. This
was in the early days of JDK 1.0.2 where object serialization was not available
so they had to devise their own protocols to transport a large hierarchy of
objects presenting the cargo within containers on a ship. They used an encoding
scheme based on concatenating strings with delimiters and found that
transmission time for the cargo of a large container ship took 50 seconds. When
I got on-site, the first thing I taught them was to instrument their code to
profile the transaction and determine where all the time was being spent. It
turned out that were simply using the "+" operator to concatenate strings,
which are immutable in Java, instead of using the StringBuffer class. After
explaining the difference and helping them rewrite the encoding and decoding,
transaction time dropped down to a more acceptable 6 seconds. I chided a few of
them for not listening carefully in class and encouraged them to look at their
notes to strengthen their foundation. Most expensive mistakes come from a lack
of proper foundation K.
I hope the case studies I detailed have stirred up memories of your own
experiences.
There are a few recurring themes one can pick up from my case studies.
They are
consolidated here to reinforce the argument that there is a compound interest
type of effect at work.
When K << Kmin, the probability of failure is very high. The main
reason is that the more K you are missing, the more likely you are to NOT know
what K you are missing nor how to acquire it. Success or failure is not
taken as a binary function here. Failure can come in the form of late delivery,
increased costs and/or lower quality. Even though the first 2 forms are more
explicit, low system quality can be much more insidious. It can end up costing
an IT department much more than the project itself! Owing too much money and
never being able to earn enough to pay off the loan is a perfect analogy.
After a system goes live, there is natural resistance to perform a major
overhaul. Besides the obvious political incorrectness for a group to admit its
inadequacy, the overriding reason is now that the business will not tolerate
any prolonged disruption. Low quality systems use up a lot of resources for
support and maintenance. Resources that could otherwise be put to acquiring K
for design and implementation of higher quality systems. So begins a vicious
cycle that can bog down departments or divisions for years.
I didn't intend to stretch the financial analogy to its limits but this
particular comparison is highly relevant. If you take care of your monthly cash
flow but neglect retirement planning, you are in financial risk. If you take
care of both but don't buy a property, you could get into trouble if property
prices skyrocket and raise your rent significantly. And just to flog the dead
horse to shreds, if you take care of all of the above but don't have any form
of insurance coverage, your family is still susceptible to financial disaster.
You get the point, personal finances dictates a holistic approach, and so does
IT work. I've met managers who have stated that their staff are "application
people" who don't need to be trained in other areas like system administration,
databases or networks.
-- WRONG! --
Unless all the necessary specialists are ALWAYS accesible to their group in a
proactive and reactive way, they are missing the first clue about IT systems.
An application invariably runs on top of an operating system which controls
access to limited resources. Even if your application has very few users and
immensely powerful client and server machines, there is always a risk of
creating performance problems from ignorance of how the underlying operating
system works. It is safe to say to all enterprise applications are accessed
through networks. I have also observed several applications perform poorly and
waste network bandwidth because the IT people involved didn't have enough K to
factor in the network. All useful systems also need to access databases. It is
well known that typical applications get the biggest performance boost from
application level tuning, but database tuning comes in a very close second.
Expertise in the workings of databases can avoid many performance problems.
Performance is just one aspect of it. Failure to understand the central issues
in any of the diverse areas can also manifest itself in potentially more
serious issues like reliability or data integrity. Many organizations are
infected with the "we're not in charge of that" disease. There are usually
separate infrastructural groups handling the networks, data and application
servers, client desktops, helpdesk etc. Problems occur because groups in charge
of applications don't train or staff up for the multi-disciplinary K that is
necessary. Upper management is lulled into a false sense of security that the K
exists somewhere in the organization but what they fail to realize is that the
people with the K don't talk to the application folks everyday. We are taught
from young in the context of healthcare that
"Prevention is better than cure"
. Unfortunately, in this case, cure can be much more expensive because people
with the right K to fix any platform-related problems probably won't know the
application and may have to take time to understand it before they can locate
and correct the problem. In the worst case (which I've seen too often), the
application may be implemented in a way that doesn't lend itself to any quick
fixes.
When children are taught to save from an early age, they internalize the
benefits of delayed gratification for the rest of their lives and are less
likely to get into financial trouble. If taught more comprehensive approaches
to financial planning, they may even execute a plan that insulates their
family monetarily from extreme misfortune. I contend that it is possible to
make such an approach work in the IT world as well. I should know because I
have lived on both sides of the fence and spent the latter part of my career in
positive K-flow. In the early years, I worked in a few groups that supported
mission-critical factory applications. Half the time, our pagers interrupted
development work in progress and forced us to troubleshoot emergencies that
were hindering production. The rest of the time, we were enhancing systems that
weren't designed to easily adapt to business process changes. New screens or
reports often caused chain reactions of performance degradation or incorrect
behaviour. I had a deep sense then that rewriting some of the applications from
scratch may have been more cost effective than our patchwork. The biggest
turning point for me was joining a technology group which had the charter to
determine best practices and tools, and lead department-wide standardization.
We were given time to read, experiment, meet vendors, compare tools and
techniques, and build prototypes to validate our ideas. We acted as a sounding
board for other groups to bounce their ideas off, and we pitched in to help
projects requiring expertise we had acquired. It was this foundation that
brought my K level up to the point that, as technical lead for one of the
geographies in a distributed development project, I contributed to building a
new test system that required an order of magnitude less support than before,
and offered extensibility, customizability and scalability to the point that
it is still in use now 8 years after FCS. My high K manager encouraged our
pursuit of K in diverse areas that offered potential for productivity
improvements. We introduced incremental compilers and memory checkers to the
mainstream and investigated OO languages and databases. It was in such a
community that I realized the value of technical journals, interest group
meetings, TOI sessions, talks, trade shows and conferences. As a result, I
picked up on hardware capabilities, C++, system interface programming, UNIX
internals, CORBA and OO design techniques many years before I had to use them
for serious work. Through activities as mundane sounding as open houses at our
research labs, I have gained exposure to powerful languages like Self and
lightning fast implementations like PJava. I was aware of work like the slab
allocator and the zero copy framework for I/O long before those features and
others from Spring made it into Solaris. It is exposure to these types of ideas
that enhances ones base K and pushes the bounds of ones creativity outwards and
upwards. It is an unending road. I will always have gaping holes in my K but
with dilligence towards proactively mastering frameworks like J2ME, J2EE
(Connector, CORBA, EJB, JDBC, JMS, JNDI, JSP, JTS, Servlet and XML) and Jini,
and immersive APIs like JavaSpeech, JMF and JTAPI, I will hopefully continue to
be able to contribute at a high E level.
I have spoken to many demoralized people in IT groups working in deficit
spending mode. Most of them are either developing against impossible deadlines
with deficient tools and insufficient K or supporting unstable systems that
demand lots of tedious activity that does not help them build up their K. When
people have to face unpleasant situations all the time, are set up for failure,
or are not given a sense that they are developing themselves, they will not
perform at their best.
Short of wasting lots of human resources on these situations, there
will be a general downward spiral in the health of the organization and
systems. I've seen a system so lame that it has to be patched every week, and
another that crashed or hanged so often that the implementation team was fully
occupied for a while, just reporting on the failures, discussing it amongst
themselves, explaining the situation to the users, and building scaffolding to
prop it up artificially.
Often, these undesirable situations are of the magnitude that management cannot
ignore, yet there is a lot of hesitation to admit the full extent of these
mistakes. Only with the acceptance that it is more cost effective to replace
rather than maintain the really bad systems will the organization be able to
get out of the rut. Reorganizations would actually serve this purpose well. New
management is much more likely to recognize the true extent of the problems
than those who let it happen in the first place. Unfortunately, what happens in
reality is that funding is poured into building different new systems instead
of backtracking to fix old systems. A close analogy is a person in negative
cashflow who instead of controlling his expenses, spends his energy on starting
a sideline business at night. But if he has not learned the real lessons, the
sideline will probably be operating at a loss as well.
On the flipside, when IT staff are well trained, they will be confident, calm
and collected, performing their jobs in an optimal manner. When faced with
challenges, their already high morale will provide them guts and composure to
execute whatever needs to be done at high E. This is simply the way humans
are. We see it in sports teams, and we can certainly see it in IT teams if we
look close enough. Just as it is difficult to maintain financial equilibrium,
i.e. our assets are either accruing or dwindling at some rate, an IT groups'
aggregate K and E levels are either spiraling up or down due to the human
factor. It is up to us to ensure the direction is up by focusing on K.
There are many types of software engineers. Whether you're an IT architect,
technical lead, business analyst, systems integrator, professional services
consultant, programmer, technical writer, systems administrator, database
administrator, instructor, support engineer, language lawyer, tools-smith,
test engineer, GUI designer, configuration manager etc. etc. (you get the
picture), this hypothesis and recommendation probably still applies to you.
Unless your job is really simple (in which case you might still want to follow
my recommendation since you are easily replaceable :-), the knowledge it
requires is probably built on numerous layers of other knowledge. You may know
the most applicable layer well, but it is in your interest to make sure you
also know some basics about all the other layers. Try to shake off the "I don't
really need to know that" attitude. If you really cannot be bothered to extend
your K levels beyond the immediately necessary, make sure you have people with
K in the other layers involved in your projects. A close parallel I can think
of is driving a car. People who spend the least and get the most out of their
vehicles are those who know how to take good care of it and do most things
themselves. If you can only be bothered to drive and pump fuel in when the 'E'
light comes on, the least you should do is make sure a mechanic looks at it
every once in a while. Consequences of not doing so can range from a dirty
windscreen due to no washer fluid (inconvenience) to blowing your head gasket
because your engine oil level is too low (total failure).
If you are already in the high K zone, don't let up. Maintain a high bank
balance until you are approaching retirement. One has to put in considerable
effort to remain at the cutting edge of software technology. I have heard
estimates that the half life of skills you have is approximately 3 to 4 years.
Conservatively speaking, if any K you have took 4 years to be worth K/2, you'd
have to replenish 16% of your knowledge every year just to remain at the same
level. If the half life is 3 years, you'd have to replenish 21% of your skills
each year. And if you've been struggling like I have to keep pace with the rate
of change surrounding Java and XML, you might agree that the half life feels
more like 1 or 2 years! But there is no need to get alarmed, since specific
skills are just one part of the equation. K is comprised of specific skills as
well as experience. Experience lasts a lot longer, especially if you have
internalized it. For example, when Java burst onto the scene, it obsoleted my
knowledge of lex and yacc as those were C-based tools. But when I learned
JavaCC, my experience with syntactic rules and parsing were very helpful in
ramping up my skill with the Java-based tool. The point is that each individual
is responsible for his or her own K development. No matter what type of IT job
you have, it is in your own interest to acquire of maintain a high K level.
I will not get deep into the subject of high quality versus low quality K.
Reading what I've written so far would already tell you that I think it's
better to know fewer things very well than to have superficial, and sometimes
dangerous, knowledge of many things. So if there is K you need to know for your
job that you feel shaky about, backtrack and solidify your foundation before
going forwards to acquire need K.
If I have convinced you that you that it is worth your while to be on the
positive cashflow
side of compound interest, I am happy that I've shared something useful.
If you follow or are already practising the above recommendations, then I'm
happy for you. You must be as happy at your job as I am.
Yes, despite my conviction to keep politely declining any 'promotions' to
managerial positions (10 and counting) for reasons that I will not disclose
here, I do give a damn about you because your conduct or misconduct affects
the lives of many who are trying to make a living as IT workers.
One of the most significant contributions I've made to my company is to
actively participate in the recruitment process. I have interviewed more than
300 people (I know because I keep the resumes) for positions in various
departments, and this has resulted in the hiring of almost 20 good people (not
to be confused with 20 almost good people :-). Perhaps more significant is the
rejection of the others who were either under-qualified for the job (K <<
Kmin), didn't fit into our corporate culture, or didn't have the
integrity to represent their skillsets accurately. Too often have I observed
managers hiring the wrong people because they rushed to fill up positions so as
not to 'lose the headcount'. Worst yet, I've seen managers hire the wrong
people because no due diligence was applied to ensure that candidates truly
knew what they claimed to know. When a ringmaster is
"Hiring a Juggler"
for the circus, the resume and interview is one consideration, but isn't it
most important to get the juggler to juggle in front of you? It is hard work
testing every promising looking candidate to see if they are for real, but it
is much harder work trying to get someone without the K to be productive. You
need high K to ensure recruitment of high K personnel. If you don't have such
resources at your disposal, borrow a high K resource from another group just to
help with the interview process. Remember that managers are partly measured by
what their direct reports produce, so it is in your interest to hire people
with high K to salary ratios.
Whether you are in the midst of whirlwind recruitment or a hiring freeze,
chances are you already manage quite a few people. You job is to maximize these
resources in pursuit of your organizational charter. The single most important
long term activity you can undertake is to interactively supervise each
person's career development plan, align their personal goals with the goals of
the group, and encourage them to raise their K level every year through
training and experimentation. Most managers I've observed employ
'Spanish Theory Management'
and try to squeeze as much of the 'fixed value' out of employees as possible.
The better alternative is to invest in people by raising their capabilities so
the compound interest effect can kick in. They will not only produce more and
better value for you in the future, you would have differentiated yourself so
much that they will probably follow you wherever you go.
My final recommendation for you goes hand-in-hand with the previous. I
recommend making a conscious choice of quality over quantity unless commanded
by your managers to do otherwise. It seems to be in vogue to report as many
milestones in as little time as possible. One seldom hears anyone talking about
the high quality of any system, only that it FCS-ed or went into production.
If you change the focus of your group to quality instead of quantity, you would
at least be pointing your people in the right direction to positive cashflow.
The next step is to not cave in to external pressure when you know the right
thing to do is budget enough time for projects to include architectural and
design reviews, prototyping if applicable, load and regression testing,
features for supportability, and mechanisms for user customization. The more we
pay up front to make sure a system is sound and usable, the less we have to pay
later (with compound interest) to keep it going. Once quality becomes a way of
life in your organization, quantity will take care of itself, your high K staff
will want to produce more to showcase their high E.
Despite the lack of formal proof, I hope I've convinced you of the existence of
a compound interest effect in the IT world.
I suspect some of the same compound interest magic is at play in other industry
sectors as well. I know it is more noticeable in software than in hardware due
to the intractable nature of software. If a business system is klunky, one can
still fix data by hand or work around it using some maintenance screens; but
when a microprocessor fails, system failure is usually total. An incompetent
(low K) hardware engineer is less likely to cause catastrophic harm than a
comparably incompetent software engineer because hardware failures are usually
more obvious. In all probability, the low K hardware engineer will not be
assigned more responsibilities after demonstrating his or her inadequacy. Since
software is so easily patched on the fly, a low K engineer can rise up the
ranks from writing lousy routines, then lousy programs, up to lousy collections
of systems. I know this because I have witnessed this first-hand. In a culture
where there is no time for regular design and code reviews, an engineer's
political skills are far more important for career advancement than his
technical skills. The pace of the industry is such that upper management is
more concerned about immediate problems and willing to live with the existing
systems that 'sorta kinda work'.
Abstracting yet another level up, one can say that most human endeavours offer
the same choice between instant and delayed gratification. While that may be
true, I suspect it is more pronounced in IT for 2 reasons. The first is that
IT work is much more knowledge-centric than effort-centric. One way to
elaborate on this is to compare a software engineer to a neuro-surgeon. The
surgeon may have taken 10 to 15 years to complete specialized training but his
contribution to the world still occurs one patient at a time. He has to operate
for hours at a time, and only affects the patient, and his team to a lesser
extent. Even if he contributed to the body of knowledge by becoming a medical
researcher, other surgeons only benefit by reading and internalizing his work.
A programmer, still in high school, may come up with a revolutionary new
algorithm that improves an area of graphics processing performance. If this
were incorporated into the Java Development Kit, millions of other programmers
and users would benefit without even lifting a finger. The second reason for
the compound interest being more pronounced is that software engineering is
highly collaborative in nature. Using the above example in a negative light,
even if the surgeon made judgement errors, he can only kill one patient at a
time. Though the typical software bug may not result in fatalities, chances are
high that it could cause hundreds of people dozens of hours to detect it and
find a workaround. This is exacerbated by a lack of mandatory certification and
licensing in the software industry.
If you generally agree with what I've presented, chances are you have a wealth
of your own case studies to share with others. I am not going to incorporate
anyone else's case studies on my site but I'll be happy to maintain a list of
links to your own passionate venting.