I always get distressed when I read about the latest example of someone cracking a commercial system, stealing credit card numbers, etc. There's absolutely no excuse for this! We in the industry should have the knowledge and skills to preclude the possibility of anyone hacking into back-end systems. Those of us who've been at it for a number of years have learned the architectures and techniques for protecting information. I can only surmise that the people responsible for some of these systems have not had the benefit of experience. I don't personally know anyone who would make the kind of obvious mistakes we read about on a frighteningly frequent basis.
Firewalls are powerful tools which can be used to limit the ports numbers
which are exposed to the Internet. In the case of a WWW server, the only
one which should be accessible should be port 80; also port 443 if you're
using HTTPS. Some servers expose a wide variety of ports and protocols by
default. While I consider this to be a huge mistake from an architectural
standpoint, using a firewall can prevent external access to these ports.
While internal servers are reasonably safe from hacking, external-facing
servers need to be hardened against any possible attacks.
I should admit my bias at the outset. I don't consider Microsoft's
products to be very secure. One simply has to look at the frequency
of security updates to the underlying software and applications. I
wouldn't even dream of using IIS as an external server, for example.
All I have to do is look at my Apache webserver logs to see the typical
attacks which are designed to exploit security holes in IIS. Microsoft
programmers also don't seem to have learned how to handle buffer overruns.
That's one of the advantages of using Java over languages such as C and
C++. The platform handles strings rather than requiring the programmer
to allocate and manage a fixed-length buffer.
I use an Apache/Tomcat combination as my front-line server. While it doesn't
offer full J2EE functionality, it's a high-performance HTTP server combined
with the reference implementation of a servlet container. You can use JDBC
in order to interface to a database or RMI/IIOP to communicate with a
back-end J2EE server. And that's one of the keys to designing a secure
network architecture. Using a second firewall between your front-end and
back-end servers, and limiting the ports which are exposed, ensures that
there's no direct access to the back-end servers from the 'net. Front-end
servers typically reside in what is commonly referred to as the
"demilitarized zone" or DMZ. Here's a diagram.
In this architecture, the external firewall would typically only pass
requests for ports 80 and 443. If the back-end server was running Oracle
and you were using JDBC then only port 1521 would be enabled on the
internal firewall. Since the front-end servers should be equipped with
dual network interfaces, it's easy to also limit access by IP address
on the internal firewall. Finally, on robust systems such as RedHat
Linux, you can use iptables to serve as the firewall. Since it functions
at the network interface layer, you can actually combine the firewall
and front-end functionality in a single server. The disadvantage is that
you can't use load balancing in that scenario.
Load balancing is another complex topic and actually impacts how one
designs an application. There are two types of load balancers; statefull
and stateless. In the first the load balancer examines the source of
IP packets and routes requests to the same server every time. This can
make it easier for developers to maintain session data on a single
server. The drawback is that if a server goes down then the session
data is lost and the users routed to that machine will have to login
to the application again. What I consider to be a better approach is
one which permits users to access any front-line server without regard
to session persistence.
And this is where load balancing impacts the application architecture.
J2EE servers have the ability to persist session data to a database.
Whether using URL rewriting or cookies, a session reference can be
maintained by the client and servers can retrieve the data from the
back-end database. If a developer opts to maintain a lot of data in
the session then there's obviously going to a performance impact with
my preferred approach. The solution is simple, however: keep your
session data to a minimum. The caching capabilities of a modern RDBMS
should be able to serve up the data in a fraction of a second. What I
like about stateless load balancers is the ability to add front-end
servers according to load or take them down for required maintenance
with no impact to users.
So now we have no direct access to the internal servers from the 'net.
They're blocked by both port numbers and IP addresses. Using a reserved
IP address range (such as 192.168.x.x) means that nothing coming from
the net can masquerade as one of your front-end servers. Since only
ports 80 and 443 are allowed in through the external firewall then
they also couldn't access port 1521 on the internal firewall anyway.
One of the dangers of not limiting access via the external firewall
is that other protocols could be used to breach security on your
front-end servers. Skilled hackers could then inject trafiic on the
second network interface to gain access to back-end servers.
Security needs to be an overarching concern when building web
applications. From using HTTPS when requesting personally-identifiable
information to using encryption to store credit card numbers, people
have a right to expect that a company is doing everything in their power
to secure information. While some might suggest that the hackers are
more capable than the custodians of information, I would disagree.
We have learned much about ways to secure information. Tools and
techniques exist to prevent access to sensitive data. We need to
ensure that we apply them intelligently or else run the risk of a loss
of faith in the entire e-commerce industry.
I could probably write an entire article about this topic but will
add it here for your consideration. As I mentioned above, I don't
consider IIS to be a suitable host for "industrial strength"
applications. I don't even consider it appropriate for entirely
internal applications. And that's why I'm perpetually surprised by the
number of job postings I see these days which require both J2EE
and .NET/ASP. As far as I'm concerned you either commit to a
platform-agnostic solution like J2EE or you choose the properietary
Microsoft approach. I've
found that it's often a question of personal philosophy. That could
explain why nobody I know personally has embraced both. You're either in
one camp or the other so requiring both doesn't make a great deal of
sense to me.
Part of the problem, as I see it anyway, is that some companies
have chosen the proprietary approach because it's quick and easy,
not to mention cheap. Let's face it: VB programmers are dime-a-dozen.
But applications written in Visual BASIC are simply not what I
would choose to support critical business functions. Java doesn't
let you make the typical C-language mistakes (assignment versus
comparison in an if statement) and type declarations
such as ArrayList<String> can catch incorrect assignments
at the compilation stage. The more mistakes you can catch at the
development phase, the less likelihood of serious application
errors at the testing and deployment stages.
Finally, it surprises me how many companies require skills in
technologies which I consider to be obselete. Perl was the original
language used to generate dynamic content on webservers. It was also
used by UNIX system administrators for generating reports. It's not a
strongly typed language so it's easy to make mistakes. PHP originally
stood for Personal Home Page and was designed along similar
lines. ColdFusion was an attempt to improve the situation
but ultimately couldn't support applications at the enterprise
scale, IMHO. The volume of concurrent requests on the 'net these
days can tax even server solutions like J2EE. But at least J2EE is
scalable and, with appropriate knowledge of how to architect
solutions, can support incredible request volumes.
Of course not everyone will agree with these conclusions. Some can
reasonably suggest that solutions like WebSphere and WebLogic are
too expensive, even though OSS solutions like Glassfish are
readily available. My response would likely be that these same
people might try to use something like MySQL rather than a solid
RDBMS solution like Oracle or DB/2. There's a big difference
between designing solutions which will serve hundreds or
thousands of requests per day as opposed to millions or tens of
millions. If your website is vital to your company and your
cashflow, has to run 24x7x365 then you should select the most robust
solutions available. As always, YMMV.
April 29th, 2009
Copyright © 2009 by Phil Selby
Other Thoughts