Dec 31, 2011

The Scalability Checklist

This is a mega summary of the most useful points I've found for scalability in quite some time. It outlines mainly ideas to prepare your applications for horizontal scale out, I use it as a reminder/checklist all the time. I've organized the points into different categories for easy reference depending on the situation.


You'll probably disagree with the ideas posted here or may have some of your own, either way, I encourage you to comments to make this list richer. In the end, the main goal of a scalable system is to maximize throughput and minimize response time with the minimum resource utilization.


NOTES:

  • I've tried to keep this post clear and concise (as a checklist) so I left many explanations out.
  • Some of the points are targeted towards managed languages like Java or .NET, but they should still be helpful if you're using other technologies.

Software Architecture/Design
  • Partition your application into logical layers which can be scaled out to physical tiers when necessary. At the very least define: presentation, business logic, data access and DB layers initially.
  • Make sure the above mentioned layers are testable independently.
  • Low coupling: connect your applications via services exposing interfaces with well defined contracts
  • High cohesion: design stateless services when possible, minimize their scope and promote reusability between them.
  • For internal application services, promote the use of TCP or binary connections/serialization (make sure your choice is load balancer compatible). Use SOAP and REST when interoperability is a concern.Keep the business logic in the middle layer/tier. Clients should focus on presentation and authorization (trusted subsystems/facades)
  • Avoid or minimize roundtrips between layers/tiers as much as possible:
    • Use asynchronous calls from the browser instead of full page postbacks
    • Promote middle tier asynchronous service calls
    • Cache resources in the middle tier to avoid DB roundtrips and IO


Memory management
  • Clean up unmanaged/disposable resources as soon as possible. Instantiate late, dispose early
  • Avoid unnecessary boxing/unboxing
  • Avoid excessive string concatenations, use string buffers/builders for this purpose
  • Do not catch exceptions to validate processes
  • Remove event handlers as soon as they're not in use


Multithreading/Concurrency
  • Too many threads consumes resources, increases context switching and contention overuses CPU.
  • Too few threads unnecessarily affects throughput, underuses CPU.
  • Avoid the use of lock (pessimistic), rather promote the use of more optmimistic, lock-free patterns such as Compare-And-Swap (CAS) when possible (use Atomic variables).
  • If lock is necessary, reduce granularity as much as possible (e.g. do not use synchronized methods in Java).
  • Do not lock static methods, this locks all "instances" of the class
  • If you have a long running single-threaded task, and you have multi-core server(s), analyze the possibility of parallelize such task.


Data access
  • Preferrably do not use ORM's
  • Use stored procedures, do not use inline SQL
  • Keep an eye on large result sets, page/cache them efficiently
  • Keep an eye on the isolation level at all times
  • Focus on connection pooling, use as few trusted identities as possible when connecting to the DB


System Architecture

  • Clustering.
    • If your app server supports it, try to take advantage of this.
    • Configure your Presentation Tier servers as Web Farms.
  • Load Balancing. If clustering is not possible...
    • For stateless services, scale out the servers by "cloning" them and load balance the requests.
    • For stateful services, maintain session information in a coherent cache.
    • NOTE: Session management is most effective when done in the Middle Tier, Web Farms should be kept as stateless as possible. This shields the client servers from the internals that deal with the data making your application more maintainable.
  • Caching.
    • Cache as much data as possible, starting with Reference Data
    • For a multiple node cluster/system, use a distributed cache
    • Define the cache invalidation strategies carefully, such as activation/passivation and expiration and eviction (e.g. LRU, LFU, etc.).
  • Non-blocking I/O based application servers
    • Almost all app servers at the time of writing support NIO operations, however is always good to double check.



Databases

  • Separate OLTP and OLAP clearly and tune each one independently
  • Schema design
    • Look to achieve at least 3rd normal form in OLTP's.
    • Analyze stored procedure queries and performance on an individual basis, choose your indexes wisely.
  • To spread out reads:
    • Use Replication (master-slave)
  • To spread out writes:
    1. Use transparent horizontal table partitioning to different file groups/disks (e.g. SQL Server Data Partition Views or MySQL partitioning options)
    2. Use DB sharding. NOTE: DB routing logic should be kept in the Middle Tier

Dec 27, 2011

How to set up a complete Java EE development environment in Linux (Java EE 6, Glassfish, MySQL, Eclipse/OEPE and Ubuntu)

I've found that setting up a whole Java EE development environment from scratch can be long and sometimes confusing process, especially for newcomers.

In this post, I aim to provide straightforward, down to earth consolidated step lists to get a full Java EE development workstation up and running in no time. I tried to be as detailed as possible and yet concise, so you'll see everything from download links to installation guide summaries to necessary dependencies and workarounds, all in the right order. Basic knowledge of Linux is required though.


To make this whole description more compact I'll be installing all the components in a single workstation, simulating a complete local development environment. However, this won't stop you from installing components (such as the MySQL database) onto shared and separate development servers.

These are the components I chose (the latest versions at the time of writing). Whether you decide to pick the same ones or not will define how much your mileage will vary, it's up to you.
  • Ubuntu Linux 11.10 (Oneiric Ocelot)
  • Oracle/Sun JDK 7
  • Oracle/Sun Glassfish 3.1.1
  • MySQL 5.5.17
  • Eclipse JEE Indigo
  • Oracle Enterprise Pack for Eclipse (OEPE)



Notes:
  1. For the purpose of setting up this dev environment as fast and clear as possible, I decided to go 100% with binary packages this time, I didn't want to get into the business of documenting source configurations/options/flags/dependencies that most likely will be useless to the reader and would make this post longer.
  2. Using the latest binaries means I'm not installing the main components from the Ubuntu repositories, but from the downloaded versions. The advantage of doing this is that we get to install the exact versions we want, the downside is that we won't be able to rely on the Ubuntu package management system, but this is not a problem. This helps make this post as generic as possible.
  3. You could pick another Application Server of your choice, the bottom line remains the same: to install it and configure its JDBC access to communicate with MySQL
  4. You might not want to use Eclipse, but Netbeans instead. In this case, you won't need OEPE since Netbeans is Java EE ready and comes completely integrated with Glassfish. I chose to use Eclipse since I think it's more widely used and I'm simply more comfortable with it.
  5. Sorry for the lack of screen shots, I wanted to keep this post easy to navigate. It should still be easy to ready though.

A note on virtualization

For most cases I choose to virtualize my workstations, there are a lot of benefits which are out of the scope of this post, but I highly recommend it due to the flexibility it provides to change things around.

There are many hypervisors out there, but for Ubuntu I used VirtualBox. It's free and yet includes plenty of features; it was the only solution that really supported 3D acceleration with Ubuntu (to use Unity 3D).

Let's get started...


Operating System

For this I decided to work on Ubuntu 11.10 (Oneiric Ocelot). The Ubuntu guys have worked really hard to make their distribution extremely easy to install, just go to their website and pick your installation "flavor". Once you've installed it, and are happy with your window manager, make sure your software is up to date.



Downloads


MySQL Installation Steps


Just one note: make sure there is no my.cnf anywhere (e.g. /etc/my.cnf, /etc/mysql/my.cnf, /usr/local/mysql/my.cnf, etc.). My particular linux installation had a bogus version on /etc/mysql/my.cnf that complicated things for a while... yes, before installing MySQL at all.


The following command list is basically a reviewed version of the one in the install readme.
  • Install libaio (apt-get install libaio1).
  • Unpack the mysql-5.5.17-linux2.6-x86_64.tar.gz binaries folder to a destination of your preference (e.g. /usr/local or /opt)
  • Run the following commands:
    • groupadd mysql
    • useradd -r -g mysql mysql
    • ln -s INSTALL_DIR /usr/local/mysql
    • cd /usr/local/mysql
    • chown -R mysql:mysql .
    • ./scripts/mysql_install_db --user=mysql
    • chown -R root .
    • chown -R mysql data
    • cp support-files/my-medium.cnf /etc/my.cnf (you can copy any one of the other templates or use your own)
    • Test the installation with: ./bin/mysqld_safe --user=mysql
    • ./bin/mysql_secure_installation
    • cp support-files/mysql.server /etc/init.d/


Installing Java EE 6
  • Unpack the JDK 7 to a directory of your choice (e.g. /opt/jdk-1.7.0) and add its bin directory to the PATH.
  • Install Java EE by running the  java_ee_sdk-6u3-unix.sh  script. Add the suggested directories to the PATH. This post assumes Glassfish will be installed under /home/[user]/glassfish3.
  • Install the ia32-libs package in Ubuntu (apt-get install ia32-libs). This is necessary to run the updatetool.
  • Use the updatetool to
  • Install the Java 6 Tutorial component, listed under Available Updates (optional)
  • Install Ant, under Addons (optional)


Configure JDBC for MySQL in Glassfish
  • For complete details refer to this post: http://www.albeesonline.com/blog/2008/08/06/creating-and-configuring-a-mysql-datasource-in-glassfish-application-server/
  • Extract the mysql-connector-java-5.1.18.jar and copy it to the domain lib/ext directory (e.g. /home/[user]/glassfish3/glassfish/domains/domain1/lib/ext)
  • Restart (or start) Glassfish /home/[user]/glassfish3/bin/asadmin restart-domain
  • Open the Glassfish admin website (http://localhost:4848), provide the credentials if necessary
  • From the left pane menu, expand Common Tasks -> Resources -> JDBC and click on JDBC Connection Pools
  • On the right pane, click New:
    • Indicate a Pool Name (e.g. MySQLPool)
    • Resource Type: select javax.sql.ConnectionPoolDataSource
    • Database Driver Vendor: select MySql
    • Click Next
    • Ping: check Enabled
    • Adjust configuration parameters as necessary
    • At a minimum set the following Additional Properties:
      • User (e.g. root)
      • ServerName (e.g. localhost)
      • DatabaseName (e.g. test)
      • Password (your password)
      • Url (e.g. jdbc:mysql://:3306/test)
      • URL (e.g. jdbc:mysql://:3306/test)
    • Click Finish
    • Once you're back in the Connections Pool page, click the newly created MySQLPool
    • Click Ping; if all went well, you'll see a "Ping Succeeded" message
  • On the left pane, click on JDBC Resources:
    • On the right pane click New
    • JNDI Name: jdbc/MySQLTest (assuming this resource represents a connection to the Test database, you may use any name though)
    • Pool Name: MySQLPool
    • Click OK


Eclipse Installation and Configuration Steps


Eclipse doesn't really need any special attention, just unpack the binaries to a directory of your choice and fire up the eclipse binary, select the workspace location and that's it
  • OEPE Installation
  • Add the glassfish server
    • In the Servers tab, right click and select New
    • Use the "Select the server type" filter to look for GlassFish
    • Select GlassFish 3.1.1
    • Select the Glassfish application directory. Assuming it's installed in /home/[user]/glassfish3, the app dir will be /home/[user] /glassfish3/glassfish. Note: the domain directory (e.g. /home/[user] /glassfish3/glassfish/domains/domain1) must be writable by the user running eclipse.
    • Click Next until Finish
  • Configure MySQL:
    • In the Data Source Explorer, right click on Database Connections and click New.
    • Select MySQL from the Profile Types, Next.
    • Click on the New Driver Definition button.
    • In the Name/Type tab, choos the MySQL JDBC Driver 5.1 version.
    • In the JAR List tab, remove any default mysql-connector you see.
    • Click on the Add JAR/Zip button and select the mysql-connector-java-5.1.18-bin.jar file
    • In the Properties tab, indicate the jdbc Connection URL to your DB (e.g. jdbc:mysql://localhost:3306/test), a Database name, a User ID and Password.
    • Click OK, ping to test connectivity
  • Download libsvn-java (JavaHL)
    • This is a quick and dirty way to install this dependency in Ubuntu. We're doing this because remember, we're not using Ubuntu's package management system. If we were to install libsvn-java with apt-get, we would get a whole lot of java dependencies we already installed on our own.
    • Unpack the /usr/lib/jni directory from the Deb package and place it in /usr/lib/jni
    • Add -Djava.library.path=/usr/lib/jni to the eclipse.ini in the installation directory

That's it, your new environment is now ready to test, prototype, demo or develop your Java EE applications.


In the future, I'll probably add notes and steps for Netbeans and other Applications Servers (e.g. JBoss)