Enterprise Over-Caching

22 October 2009

Caching your database tables within an app server for a website (eg. Memcached or C#’s Cache) is a very bad idea.

Every time you add caching to a set of data, you have to add extra code – which means added complexity and potential bugs. You’re adding consistency issues as well – especially if any other apps (or DBAs) can modify the data without invalidating the cache. These problems can be solved of course – with more code, which means even more complexity and bugs.

Most importantly, let’s consider speed. If you’re caching data, it’s probably because of speed concerns in the first place. One way or another, you’re going to need to access this data. Even worse, you probably need to sort the data too. Why the heck would you tiptoe around the database like you’re scared to pull data from it? That is exactly what the database is tuned for! Accessing the cache for these types of requests is significantly slower than pulling from an indexed database table.

Why not handle data access speed in your data layer? Rather than using each app server as a horribly inefficient slave database – use a slave database! This is going to be faster, more scalable, and require zero extra code.

Caching isn’t all bad, of course. Caching the static content of the 10 most popular articles that account for 70% of your website’s requests is probably a good idea. Caching your entire Users table? Not such a good idea. For some reason premature caching of entire database tables seems to be the norm in “enterprise” C# and Java web apps, even ones with a handful of users.

If you get put on a project like this, suggest scaling back the scope until it’s needed. Or at least until after the application is ready to be optimized based on performance profiling. Don’t add that code until there is justification that a performance problem exists, and that adding code is really the best way to solve it.

Test-first Coding, Unit Tests Not Required

16 July 2008

Some developers don’t need to test their code. They think up a grandiose scheme in their head, program every last line of it, and then commit the code to version control. The code may or may not even be used in the application, but that’s ok. The goal is getting more lines of code saved, right?

All wrong. More code is not “more better” – just the opposite. It’s more important to have simple well-tested code that covers all scenarios that are encountered, than to have piles of “wishful thinking” unused functions and “just in case” error handling.

Here are a few guidelines that can help any developer or project improve in quality:

1. All new code should be tested repeatedly while being written.

This is a code-level equivalent to Agile’s principle of frequent reviews with the client. Everything can be fixed faster if it’s caught sooner. Coding without testing along the way is like trying to paint blindfolded. Sure, you might be able to paint over the screw-ups when you finally take the blind-fold off, but why not do it right the first time?

There are many benefits to testing during coding. By testing as you go, you’re more likely to work on functional, testable features. Who hasn’t seen or coded something where an enormous amount of time was spent on some elaborate framework, only to find in the end that it wasn’t even what the project needed? By starting with testable functionality you force yourself to follow the important mantra of:

Make it work, make it right, make it fast. In that order.

It’s always easier to refactor code to be “right” if it’s already complete and working. If you write perfect, pristine code, that does the wrong thing, it will likely need to be trashed entirely to be able to create working code. Testing as you write guarantees that your code does what the application needs.

2. All new code should be tested and working before being committed to the repository.

This should go without saying, but somehow it does require saying. If code doesn’t work, it absolutely should not be committed. I’m a firm believer in keeping your repository sane throughout development. If you break a page or function and commit it broken, you’re preventing others’ from getting their work done. There’s also a good chance that you’ll leave that code broken due to some distraction, at which point it’s possible the broken code will get launched.

If you get distracted by a more urgent priority, create a second sandbox to work on that from. For a larger refactoring or a big new feature, create a project branch. These tactics allow you to mangle the code in any way imaginable, while still letting your teammates work and launch incremental improvements or bugfixes as necessary.

Even if you only made the tiniest of changes affecting something, test it. More often than not, you’ll be surprised to find some kind of syntax or logic error that would have gone to production because of your carelessness.

In an ideal world QA is going to review the code before launch, but let’s be realistic: Most projects don’t have the time for that. When it comes down to it, it’s your code, and you’re responsible for it. QA has enough to worry about without developers being outright careless.

3. All code should be in use, or be deleted.

Even if you’re pretty certain a function will be needed down the line, don’t do more than creating a stub for it. By implementing code that isn’t used, you’re immediately breaking rules #1 and #2. You can’t test it while you work on it OR when you’re finished with it – it’s not used anywhere! Unit testing sort of gets around these problems, but what’s the point? You’ve now got a unit test AND a function that may or may not match actual functionality that doesn’t exist and may or may never be needed.

Note that this rule doesn’t just apply to new code! If you remove all references to a function, remove the function too. This is every bit as important as not creating unused functions in the first place. Unused code will rapidly become broken or deprecated. Later on, developers will find this function and attempt to use it, with the expectation that it works correctly. In addition, you have an exaggerated codebase of unused functions – so any attempt at refactoring or adding on to the codebase is more difficult than it should be.

These are some of the most important concepts to drill into your junior developers’ heads. I’m grateful to my Comparative Languages teacher at Virginia Tech for getting me started on the right path during college. Combine these concepts with a version control system like Subversion, and you can successfully refactor even the most horrific of codebases. Trust me – I have!

Interfaces, not Objects

15 July 2008

When I was young I used to hate vegetables and Object Oriented Programming. I learned C++ in high school and college, but was turned off by the overly elaborate and useless example projects. Especially in C++, the obtuse syntax and strict typing made OO painful to work with.

I’ve grown a lot since then. I’ve been programming OO for several years, even refactoring a large functional codebase into objects. I also eat vegetables, as long as they’re sauteed, steamed, or cooked with other food. But, like the mushy microwaved carrots my parents used to make, there’s still a lot to hate about Object Orented Programming.

For many programmers OOP amounts to little more than code obfuscation. They go wild imagining the most complicated set of objects possible to solve a problem. You’d think they were handing out prizes for checking off every OO language feature. When the coding is done (if it ever gets done!) you have a mess of spaghetti that’s impossible to read or debug. One function call in your front end might require tracing through 8 different files to figure out what code is actually being executed.

The goal of OOP is not to stuff everything into as many objects as possible, and reimplement the = operator with Getter/Setter functions. Even if that’s what your college professors told you.

A better approach to OOP is coding for interfaces. Figure out the functions and data structures that would make your front-end the easiest to write. Now implement that, and only that. Don’t worry about having a few blocks of code that are moderately similar. First things first, make the code work. You can factor out those similar blocks of code into other objects or functions later if you need to. There’s a good chance that time will never come.

Every object you write before you need it is adding complexity, difficulty, and time to your project. Start with the top-level interfaces and refactor complexity in only as needed to implement Real functionality. If you try to account for every single possible scenario and abstraction it could take you months just to write “Hello World”. I once interviewed someone whose company spent 6 months working on a simple poker game, and still couldn’t get it working. Unfortunately that mindset of perfection (at the cost of results) comes natural to developers, and can be difficult to overcome.

The example that immediately comes to mind is the Active Record pattern. When you start a website, you should have a basic interface where you can transfer your objects to and from the database using straight-forward object.save() and object.get() calls. These calls can be hard-coded for MySQL, and even use hand-typed INSERT, UPDATE, and SELECT statements. It works, things are cool. The project zoomed by because you only implemented the objects and functions you needed.

Uh-oh! Your site was a hit and you’re getting slammed! Now you need to switch to an Oracle back-end with MySQL slaves. Not a problem. Because your interface is sane, you can modify your objects’ save() and get() calls. Change save() to connect to Oracle and trigger cache updates, and change get() to use the MySQL slave DBs. Wow, that was pretty darn easy.

If you end up needing to refactor into a Factory Pattern later on – well, great! That means your site was a success! You’ve found the need to add a great deal of complication to your site, and that’s OK.

But do you really think your site would have made it if you had taken the time to make every object imaginable when you first started the site? Would you have been able to hire developers and gotten anything useful out of them for the first 6 months while they tried to decipher your over-OOPing? The answer to both is ‘probably not’.

But I Still Hate Flash

26 June 2008

Today I did the unthinkable. I’m ashamed to admit, it was exhilarating. I programmed in javascript.

Ok, so it was just a dynamic dropdown changing based on other criteria selected in the form. Not exactly Gmail or Overpages. But still, this is Me we’re talking about. I turned off javascript entirely for the first 3 years it was included in Netscape.

A few things helped me transition to this break-through. Firefox added better pop-up blocking, and disabled needlessly malicious javascript features like “make my browser window resize and bounce all over my screen”. Google released amazing interactive apps like Gmail that fixed web usability I never even realized were broken. I got a Macbook, and Safari actually keeps tabs to themselves, instead of letting javascript steal the focus. And, when it comes down to it, javascript became so ubiquitous that it’s impossible to actually use the intertubes with it turned off.

That said, I still hate Flash.

I should be more specific – I hate “Flash websites”. You know the kind – the ones that take 30 seconds to load, and attempt to re-implement basic web functionality like links, scrollbars, and form elements. This time without any of the basic usability you’ve come to depend on to navigate the web — like opening links in new tabs, increasing font size, copying text, pasting URLs to friends, and so on.

I still use Flash Block in Firefox to prevent Flash from running without my explicit permission. That started one day when I went to TV Guide’s website. They had a Flash advertisement on the page where a race car literally started driving around on my web browser, complete with loud obnoxious engine sounds blaring out of my speakers. I actually uninstalled Flash that exact moment, and didn’t reinstall until I heard about Flash Block. With Flash Block I can play youtube videos and similar sane flash elements, while avoiding the tasteless ads, and getting good warning before subjecting myself to a horrific all-Flash website.

In truth, Javascript can still be used in equally evil ways. Pop-up ads when you click anywhere in a page, for instance. Or links that, instead of being an actual link, include some obfuscated javascript function, preventing browser functionality like “open in new tab” or “bookmark” from working. Heck, even with bare html, there was still Geocities, and now Myspace, to worry about.

I guess it all comes down to usability. Give site designers and developers reasonably powerful tools, even if it’s enough rope to hang themselves with. Now the really hard part is convincing your PHB and Marketing not to hang themselves.

Now there’s an idea. If only bad flash sites really did hang their designers…

PHP vs. Perl: A Retort to Slashdot’s Perl Mongers

09 June 2007

My company uses both PHP and Perl. PHP runs the website, and the back-end scripts were originally written in Perl. We’re gradually rewriting the back-end scripts in PHP.

We wanted to stick with a single language to promote code reuse, and PHP was an obvious choice over Perl. It’s a full-featured scripting language, yet extremely fast with an opcode cache. It’s easy to use, but also implements full object oriented class design. We use it for TCP/IP socket connections to internal C applications, and for XML-RPC and SOAP web service connections internally and to external partners. In general, we’ve yet to find a niche which PHP can’t fill.

With Perl, we constantly run into script failures due to servers being upgraded or reinstalled and random Perl module X is missing or incompatible, and we need a slew of contradicting dependencies to do a manual (ie: cpan) compile of the module. If I was talking about an incredibly obscure feature I’d be OK with this, but I’m talking about ridiculously basic functionality like date manipulation and array printing.

With PHP the only add-ons we have to worry about are Xdebug (an elaborate performance profiling suite that we use on our test machine) and APC (an opcode cache that approximately quadruples performance).

Is it a problem that PHP is full-featured right out of the box? Heck no! I’d much rather have a sensible set of features with a 5mb executable (ZOMG THATS SO MUCH RAM!!!) than have a uselessly-basic language with tons of self-compiled modules, making scripts non-portable and generally making hell for me running development and operations at a company with 20+ servers running various versions of various OSes.

The other complaint I hear about PHP also makes no sense to me: whining about function naming and return values. No matter what language I program in, be it PHP, Perl, C, or C++, I *always* use an IDE or good reference (ie: MSDN or php.net) to double-check function names, parameters, and return values. PHP.net is a great resource for finding functions and how to use them.

I don’t care how “logical” Perl’s argument order or function names are claimed to be. I still don’t know what they are without checking somewhere. So what does Perl 6’s restructuring get us? Everyone still has to look up functions to check naming and argument order, but now every pre-6 perl script is going to not just break, but break spectacularly, with difficult to debug wrong-argument-order problems. And if you’re working with both Perl 5 and Perl 6, you now have to remember TWO versions of the same function. How is that an improvement?

I had the pleasure of hearing Rasmus Lerdorf give a keynote at the DC PHP Conference. He talked about how hard it was for him to learn English. It’s just not logical! Still, it’s far easier to learn and use English than to “break” every book, movie, webpage, etc. to “correct” the language.

Perl was one of the first languages I learned. I used it for years before I started using PHP at my current job. Perl was an incredible improvement over C and C++ with its fast regular expression and string parsing, and its two-dimensional associative arrays. But it’s also an ugly, hackish, difficult-to-read and difficult-to-use language. I’ve had far more success personally and throughout my company with PHP. It’s got all the benefits of Perl, but with a solid foundation of functionality, reasonable backwards-compatibility between versions, and none of the ugly “$_” hacks that make Perl a write-once-read-never language. ¬†You can write bad code in any language, but Perl demands it.