Please Note: this page was last updated in 2000. Its conclusions are no longer relevant. Its data was dubious at the time and flat-out wrong now. I have no intention to update it, but I will gladly link to something more recent.

A subjective analysis of two high-level, object-oriented languages

Comparing Python to Java

Foreword

This paper is subjective, and as such, is not meant to be a serious technical discussion of the various merits of these languages. It is my own personal opinion, and we all know what opinions are like.

The purpose of this paper is to provide a perspective for people planning to start their own projects in one or the other of these languages. As such, if you are reading this and have a different paper that you'd like to submit as a counterexample, I will gladly link to it.

Section 1: Performance!

Every inexperienced programme's first question when they start to pick up a new language: how fast does it go? In order to answer this, I wrote a few test-cases (the archive containing them is at the bottom of this document) I attempted to write functionally equivalent code in Python and Java, and I created a few test programs. Here are the scores:

times are measured in real ("wall clock") seconds, since this is a practical analysis. YMWV.
Test Java Python Comparison
Standard Output 138.85 30.58 Python 4.5X Faster than Java
Hashtable 17.0 8.22 Python 2X Faster than Java
I/O 56.72 47.36 Python 1.2X Faster than Java
List 5.94 14.32 Java 2.4X Faster than Python
Native Methods 2.475 7.92 Java 3.2X Faster than Python
Interpreter Initialisation 0.25 0.04 Python 6.3X Faster than Java
Object Allocation 23.65 211.11 Java 8X Faster than Python
Interpreter Speed 0.43 2.29 Java 5.3X Faster than Python

Again, I must stress that these are approximate times, which are specific to my computer, my Java version (JDK 1.1.7B, blackdown), my operating system (Debian GNU Linux 2.2), my python version (Python 1.5.2) and my idiosyncratic test method, which is not strenuous at all. I repeated the tests 3 times and averaged the results to get the numbers you see here. Sources to the tests are available here in .tar.gz format, so you can perform them yourself.

The overall conclusion one may draw from these numbers is that Java is generally faster. If it's that simple a conclusion you're looking for, though, I recommend using C++ as your language -- neither Java nor Python can hold a candle to a true systems-level language. (I originally thought of including C or C++ comparisons here, but the difference is truly laughable, especially with compiler optimisations.)


Test-By-Test

ConsoleTest
Python Java
for x in xrange(1000000):
    print x
    
public class ConsoleTest {
    public static void main(String[] args) {
        for (int i = 0; i < 1000000; i++) {
            System.out.println(i);
        }
    }
}
    
The console test was impressive to me: Java's performance was astonishingly bad. Combined with the long initialisation time, this makes Java a completely unsuitable language for streams-based programming.

It might be worth noting here that System.out actually writes to stderr, which is highly confusing (and broken, IMHO) behaviour. Java was definitely not designed with shell scripting or piping in mind.

Hashtest
Python Java
for i in xrange(1000):
    x={}
    for j in xrange(1000):
        x[j]=i
        x[j]
    
import java.util.Hashtable;

public class HashTest {
    public static void main(String[] args) {
        for (int i = 0; i < 1000; i++) {
            Hashtable x = new Hashtable();
            for (int j = 0; j < 1000; j++) {
                x.put(new Integer(i), new Integer(j));
                x.get(new Integer(i));
            }
        }
    }
}
    
Here is one of the many places that python benefits from a standard data structure being in C; the "{}" Hashtable is one of the primary reasons I decided to move Twisted Reality to python.
IOTest
Python Java
f=open('scratch','wb')
for i in xrange(1000000):
    f.write(str(i))
f.close()
    
import java.io.*;

public class IOTest
{
    public static void main(String[] args) {
        try {
            File f = new File("scratch");
            PrintWriter ps = new PrintWriter(new OutputStreamWriter
                                             (new FileOutputStream(f)));
            for (int i = 0; i < 1000000; i++) {
                ps.print(String.valueOf(i));
            }
            ps.close();
        }
        catch(IOException ioe) {
            ioe.printStackTrace();
        }
    }
}

    
Python's I/O is also marginally faster than Java's -- considering that python's interpreter is vastly slower, this is really impressive. To be fair, this is a super-naive implementation of file access in Java. It's stream-based, and not buffered. However, the point of this exercise was mostly to test the most 'natural' way of doing things in each language.
ListTest
Python Java
for i in xrange(1000):
    v=['a','b','c','d','e','f','g']
    for j in xrange(1000):
        v.append(j)
        v[j]
    
import java.util.Vector;

public class ListTest {
    public static void main(String[] args) {
        for (int i = 0; i < 1000; i++) {
            Vector v = new Vector();
            v.addElement("a");
            v.addElement("b");
            v.addElement("c");
            v.addElement("d");
            v.addElement("e");
            v.addElement("f");
            v.addElement("g");
            for (int j = 0; j < 1000; j++) {
                v.addElement(new Integer(j));
                v.elementAt(j);
            }
        }
    }
}
    
Java's list syntax is hideous. It turns out (to my surprise) that Vector performs better than the standard [] operator in python... or at least, it doesn't sufficiently outperform it to beat the interpreter speed difference.
NativeTest
Python Java
from pynative import *
for i in xrange(1000000):
    hello()
    
public class NativeTest
{
    public native void nativeMethod();
    static {
        System.loadLibrary("javanative");
    }
    public static void main(String[] args) {
        NativeTest nt = new NativeTest();
        for (int i = 0; i < 1000000; i++)
        {
            nt.nativeMethod();
        }
    }
}
    
Python C ModuleJava C Module
#include "Python.h"

static PyObject*
pynative_hello(self,args)
     PyObject *self;
     PyObject *args;
{
  printf("Hello, world!\n");
  Py_INCREF(Py_None);
  return Py_None;
}

static PyMethodDef NativeMethods[] = {
  {"hello", pynative_hello, METH_VARARGS},
  {NULL, NULL}, /* Sentinel... what's this? */
};

void
initpynative()
{
  (void) Py_InitModule("pynative", NativeMethods);
}
    
--- Autogenerated NativeTest.h ---
/* DO NOT EDIT THIS FILE - it is machine generated */
#include 
/* Header for class NativeTest */

#ifndef _Included_NativeTest
#define _Included_NativeTest

#pragma pack(4)

typedef struct ClassNativeTest {
    char PAD;
/* ANSI C requires structures to have a least one member */
} ClassNativeTest;
HandleTo(NativeTest);

#pragma pack()

#ifdef __cplusplus
extern "C" {
#endif
extern void NativeTest_nativeMethod(struct HNativeTest *);
#ifdef __cplusplus
}
#endif
#endif
---


#include <stdio.h>
#include <jni.h>
#include "NativeTest.h"

JNIEXPORT void JNICALL
Java_NativeTest_nativeMethod(JNIEnv *env, jobject obj)
{
  printf("Hello world!\n");
}
    
Python's native interface requires no header-file-generation phase. One can design native *objects*, not merely native code. Furthermore, native modules are introspectable, and require no python code to bind them. In short: the python native interface is vastly superior to Java's. Not only that, but the function-call overhead is lower (it's still slower than Java, but the interpreter speed difference is more than made up for).
NoTest
Python Java
    
public class NoTest { public static void main(String[] args){} }
    
This, I think, is an interesting commentary on the design of both languages. First of all, in order to do nothing in python, you write exactly that -- nothing. In order to successfully start up and do nothing in Java, you have to have a class definition with the correct name, with a main method...

Aside from the philosophical ramifications of this code, there is the very practical issue of Java's large initalization overhead. It takes too long to start a virtual machine to do anything like CGI or shell-scripting with Java. It seems to me that this is a pattern throughout the language, while it is designed to scale up, it seems poorly suited to scale *down*, lower than a large application. I don't understand why they market it as a product for set-top boxes.

ObjectTest
Python Java

class ObjectTest: pass

for i in xrange(1000):
    root=ObjectTest()
    for j in xrange(10000):
        root.next=ObjectTest()
        root=root.next

    
public class ObjectTest {
    public ObjectTest next;
    public static void main(String[] args) {
        for (int i = 0; i < 1000; i++) {
            ObjectTest root = new ObjectTest();
            for (int j = 0; j < 10000; j++) {
                root.next=new ObjectTest();
                root=root.next;
            }
        }
    }
}
    
This test really surprised me. There's little I can say about it except that python needs to improve -- the allocation time on a completely empty class definition is unacceptably long.
SpeedTest
Python Java
for x in xrange(1000000):
    pass
    

public class SpeedTest {
    public static void main(String[] args) {
        for (int i = 0; i < 1000000; i++);
    }
}

    
Java's interpreter is, as expected, vastly superior to Python's. This is the comparison most people are making when they say something like "Java is faster than python". Technically, this may be true (and in applications where speed is really a factor, and a lot of code is written in one of these languages, you WILL feel it) but the large base of native library code written in C for python will negate this for most small, and some large, applications. For example, if 90% of what you're doing is writing files to disk or over a network connection, you won't really care that Java does it in half the time, because you'll be waiting on the network.

Patterns in these Examples

Python is easier to program in than Java. I have had experience with this in the past, but I kept track of these exercises. It took me roughly fifteen minutes to write all of the python examples, and almost an hour to complete the Java equivalents. None of the python programs had any errors in them when first run; I had 3 syntax errors to correct in Java.

Java's documentation is generally better, and better organised (at least the API doc) but even though I've been using Java for several years, and python for only a month, I found it easy to remember all of the python necessary for this exercise without resorting to the module doc; I had to look up 3 things about IOStreams and URLs in the Java documentation.

Java programs are longer than their equivalent python programs. In fact, they're approximately 3 times longer, if you're counting bytes in source code (remembering, also, that python programs can run from sources and Java must be compiled first). Totals for the programs I wrote (source only) were: python, 921 bytes, Java, 2,742 bytes.

Java is faster, and more suitable for "systems-level" programming. I believe that c -> c++ -> Java -> python establishes a nice continuum of systems-oriented, low-level programming to application-oriented, high-level programming.

Reasoning Behind Obvious Mistakes

Many Java fans out there are probably asking "why did you choose JDK 1.1?". I know that 1.2 may perform better than this, and I am aware that I did not choose optimal (or even necessarily equivalent) algorithms for the Java test cases. This test is centered around implementing/deploying actual applications. JDK 1.2 is not available on most platforms yet (when I say "most", I'm talking about MacOS, BeOS, OS/2, and friends. With the recent resurgence of the Mac's popularity w/ the iMac, this is especially relevant). Those platforms that ship with Java have a 1.1 implementation, and those platforms that ship with Python have 1.5.2.

I would love to hear from other people who have run the same test-cases and come up with different results.

Cool things I like about python

Unresolved Release-Critical Bugs in Java

Before I start this, I do want to say that although it completely destroys performance of large applications, I won't be considering the state of the java garbage collector a 'bug', except in cases where it actually causes the VM to core.

From a language-design perspective, the next section won't be interesting; however, it's a very important look at the real-world consequences of choosing this language, unless you plan to write your own virtual machine (or use one of the free ones, like japhar or kaffe... which don't even necessarily have the APIs implemented that these bugs are in).

Don't Use Swing.

Java's much-touted platform-independent GUI is slow. Phenomenally slow. Aside from chewing processor cycles, it will destroy the stability of any application you want to write with it -- seeing as how it leaks memory at almost every turn.

Want an arbitrary number of frames in your app? Sorry; you'd better keep track, and set a hard limit. I wouldn't count on re-using components either; add() seems to leak like a bitch too. Widget re-parenting appears to too, but this is small enough that it's livable.

Opening about 100 windows with a decent amount of widgets will crash a VM -- and I don't mean "at once". I mean at all. Open window, close window, open window, close window... repeat 100 times, and boom, your app is *permanently* out of memory, unless you've loaded all the swing classes in a special classloader and you're ballsy enough to go re-loading them all every time that happens.

The AWT does this too, but you could probably write an application that ran for longer than 20 minutes using it.

Don't allocate memory.

You can't allocate more than 32M of objects. This gets worse with JDK 1.2, which ships with a default memory limit of 16M. While this can be hacked by manually adding command-line options, this isn't possible to specify in 'pure java'.

Even if you do decide to be brave and diddle the commandline options, the aformentioned problems with GC makes it unfeasable to allocate really large blocks of memory in Java without a ten-processor SPARCstation.

Don't use java.lang.String.intern

Interning too many strings will cause the interpreter to core-dump. In their infinite wisdom, Sun's engineers saw fit to use a *native* hashtable here, even though they didn't implement java.util.Hashtable natively. It's a buggy native hashtable with a hard limit on how many entries it can contain.

Don't expect your app to run

There is no standard cross-platform way to deploy a java application. I don't mean 'there is no sun-blessed method for distributing Java apps', I mean there's no way to garuantee that your app will run unless you package the VM with the application yourself.

This problem is so bad that even third-party applications that purport to solve this problem, such as Zero-G software's "Install Anywhere" will sometimes fail, simply because it's impossible to determine certain information about the operating environment. (No slight to "Install Anywhere" -- it's a GREAT product, and works very well for their supported platforms!)

Note: Some people have brought it to my attention that the same thing could be said about python; however, I have not found it difficult at all to run my python scripts on Windows or MacOS; the packaging process for Java applications is highly platform-specific and usually requires an intimate knowledge of where you want it to run... well, unless you're not using ANY libraries at all, and your entire app is a single class-file. Java2's -jar option makes this slightly easier to do, but it's still painful (not to mention undocumented) to determine the version and capabilities of the installed VM.

Don't print anything

Everyone knows that printing is a "problem" in Java (screen-resolution printing precision has its pros and cons; I won't get into DTP freaks issues with the way that Java does printing right now). However, it's also broken. You can't print from a console application, or actually, at all, on UNIX. The Linux JDK (which I assume is the same printing code as the Solaris one?) generates corrupt postscript that will hang ghostscript and fail to print on an LaserWriter printer.

Don't write large apps

It's very difficult to figure out when you're running out of memory in java. It's rather unpleasant to get an OutOfMemoryError after one of the basic libraries has leaked all over the place, and be unable to predict that something like that was going to happen... attempting to write applications which allocate large amounts of memory is really unpleasant in java.

Don't write small apps

Java has a minimum of 4MB memory usage. So you can't write large apps, because you can't allocate that much mem, but you can't write utilities either, because the interpreter takes forever to initialize and takes up lots of memory.
Glyph Lefkowitz
Last modified: Fri Apr 7 14:28:25 EST 2000