Extending Vs. Embedding

There is Only One Correct Decision


DO NOT IGNORE POKEY OR YOU MAY DIE!!!

At some point in every Python programmer's life, they will be forced to make an apparently confusing and subtle choice: to extend or embed Python in their particular application. I have heard many people ask how to do this, and I have heard many bizarre arguments for both strategies. It is especially unfortunate that due to the nature of this problem, it will probably surface early -- perhaps even immediately -- as a newcomer to Python is trying the language out. Since this decision will often have far-reaching effects on your application's runtime environment, it is important to understand the ramifications of the decision.

The only correct way to integrate Python with an application is to extend. Here's why.

Why Python?

If you are writing a C or C++ program that needs to be extended in ways that you might not have predicted, or needs a high-level command language to drive a rich set of internal operations, Python is definitely a good idea. The native interfaces are clear and concise, and the programming language is easy to learn, and the power you will get from integrating Python support is quite surprising.

If you are writing a C or C++ program that you believe you have anticipated every possible use of, and does not need any sort of scripting or control language, you are probably wrong. That's a topic for another day :-)

What is Embedding?

Embedding is inserting calls into your C or C++ application after it has started up in order to initialize the Python interpreter and call back to Python code at specific times. In order for Python to do useful things in your application, you will probably also need to create a Python Module object and somehow insert it into the Python run-time.

What is Extending?

Extending is writing a shared library that the Python interpreter can load as part of an import statement. This means that your application no longer has a main() function, but is a set of library functions that Python code can call.

Huh? This is pretty light on technical detail.

You can read the extending and embedding tutorial for information about C integration, and find general information on the Python website.

So why should I extend rather than embed? They sound very similar.

There are philosophical reasons to care about the difference, but they are all derived from two premises, one technical and one aesthetic. The technical question is, "do you want your application to inter-operate with other applications that Python can use?". The aesthetic question is "do you want to confuse, surprise, and annoy people who may be familiar with Python from elsewhere?". I will assume that you want an application that can re-use as much code as possible from elsewhere which does not confuse and annoy its developers.

Integrating with Other Code

If you embed Python in your application, then you are making Python programs which control your application dependent upon their run-time environment. Obviously, two applications, each with their own C "main()", can't start up inside the same process and have the same Python code communicate with both. Both will have initialization-order requirements that must be resolved. No two applications which embed python can be used by the same script. You are also assuming responsibility for initializing the Python interpreter, which means that in order to integrate with another embedding application, you will need to reconcile the two applications' approach to Python initialization. Finally, you need to integrate the two applications' build processes: you may need to tweak one or both build settings to allow Python's dynamic module loader to work properly - this gets into some pretty deep platform-specific magic. This is all quite a lot of work!

If you extend, you are required to separate initialization code from execution code. You don't have to worry about Python being initialized, because it will already be initialized by the time it loads your code! Your initialization code will be run the first time a user imports your module, and your module will automatically be placed in the user's namespace with no additional work on your part. Also, you can easily automate your build process with the handy cross-platform distutils building and distribution tool. Once you've written an extension module (barring obvious conflicts, such as calling the same C APIs in incompatible ways, of course), your code will work both with other extension modules and most other people's applications who have carelessly decided to embed!

Trying Not to Annoy Your Developers

Your developers are already probably angry at you for embedding, given that they can't use the other libraries and applications they're used to having at their disposal in Python. In addition to that, you are going to have to work to develop an initialization scheme for Python. This means you are going to have to come up with a way to get the functions that your program exposes into the Python user's namespace. The right way to do this is to fake extending: to create a module object and make it so the user can import it. This is what a Python developer will expect. However, perhaps because this is easier to implement, perhaps because developers who embed don't have a proper appreciation of how to use Python, or perhaps because it seems more convenient at an interactive interpreter, embedders will almost inevitably end up pushing some "special" variables into the user's namespace which are new built-in variables. Now not only have you made the developer unable to use her favorite tools, you've made her code context-dependent so she can't stub out your functionality when using the script elsewhere! This also makes it difficult for your developer to modularize her code; since the built-in variables are sometimes only accessible in the __main__ module, other modules will have to do contortions in order to get back to where they are. If they don't do said contortions, then ALL modules used to enhance your application will be totally dependent upon it. Speaking of that interactive interpreter -- you are going to have to implement your own read/evaluate/print loop. The less this behaves like the Python console, the more upset developers are going to be. Keep in mind that these tools that developers might want to "integrate" with aren't just extra functionality (which you may think extraneous) but the development tools they use to write python code, which are themselves written in Python!

But why fake extending when you can actually do it? There's not much to add here. Since your developers will actually be using the "python" executable to load your library, they will be able to interact with it in a manner they already understand. Importing your application will work like importing any other modules. Installing your extensions will work like installing anything else that builds with distutils, and they'll be able to use their favorite Python-powered integrated development environment to work with their scripts.

Misguided Objections in Favor of Embedding

But I need my code to be Secure, so my customers can't see it!

If you're using Python at all, you're going to have to work pretty hard at that. The decompyle reverse-engineering tool plus low-level tools like SoftICE are going to make your work difficult, if not impossible. Simply re-locating your main point is not going to help.

If you really feel like you need to do this, extend normally during development, then create a custom build process to deploy, possibly involving this excellent distribution tool for Windows.

As a last resort, try to trust your customers more.

But most of my application is written in C!

Well, that must have been a lot of work! So why would you want to make MORE work for yourself by embedding Python rather than extending it? Extending is worth it, both in terms of the effort it eventually saves you and in terms of the integration benefits you get.

But I've got this really hoary initialization process where I bootstrap the whole system from a single-file binary which self-extracts from a compressed archive onto a RAM disk and then I use this LD_PRELOAD trick to redirect printf from libc to... [ad nauseum]

If this is not your fault, then I may have sympathy for you needing to embed for a while, until you can repair the godawful mess that is your initialization process. Starting up should not be the most complicated thing that your program does!

If your application is actually difficult to refactor into a library, then you need to start looking at why. It probably means that your run-time and initialization-time code are very tightly bound together, or you have something too creative for your own good in your build process. As I said, you can use embedding as a stopgap measure, but it is no substitute for clean code.

But I have this specialized requirement where I need to load a 3rd party DLL before Python...

In this case, modify the source for the Python main point to satisfy your requirement. Or, use the LD_PRELOAD trick above ;-). Before you do, though, are you sure you need to do this before Python is loaded? I've heard this objection maybe 15 times, and it's always been wrong.

But I have to run this hyper-efficient main-loop in C!

That's fine. The main loop can be a function callable from Python, and need not return. It can then call out to callbacks registered through other PyObjects. gtk.mainloop() from PyGTK and Tkinter.Tk.mainloop() -- from the python distribution itself! -- both do this.

But I want to embed multiple interpreters and/or configuration modules into my C application!

This is perhaps the most subtle objection, and it seems to be becoming more common. There are two parts to the response.

Python isn't just a language.

You can easily use the Python interpreter as your module loader. Python has an execution environment and object model that are extremely flexible and can easily be used to manage your plug-in system, even if the plug-ins you are loading are actually invoking other scripting languages.

I have yet to see a compelling reason to replicate the work done in Python's portable native module loading system. Invoking an object is not much more than an indirect function call (efficiency is not a concern), it is tricky to get a cross-platform equivalent to dlopen right, and modules that are written using Python's interfaces will have the advantage of being accessible and testable from one of your scripting languages with no additional work.

Access from multiple scripting languages is not a good idea.

In an article I posted to advogato, I described my position on this in more detail, but the gist is that it is extremely difficult to design an interface which makes sense and helps people be productive in one language. Doing it several times over is a fantastic challenge with little benefit - in fact, the effort may backfire.

Consider that, if you maintain multiple scripting interfaces, your application will develop a ragged community of customization fans, who will regularly argue about what language to use to extend your software. If you support Perl, Ruby, Python, and Scheme, you will certainly find Python and Perl fans who are using your application constantly at each others' throats, re-implementing the same 'script' functionality 8 or 9 times, and generally wasting time that could be better spent enhancing their plug-in packages.

If you support only one scripting language, but you support it well, then you will create a simple environment into which contributors may put enhancements, and you will facilitate sharing and understanding between those contributors. Focusing on a single language and supporting it well will almost certainly result in more general-use code getting written for your particular app than in a multi-language scenario. (x-chat's scripting subsystem is a case in point.)

Conclusions

Embedding takes more work than extending. Extending gives you more power and flexibility than embedding. Many useful python tools and automation techniques are much harder, if not impossible, to use if you're embedding.

Based on my experience with other projects and programmers, you will very probably like Python. As you grow to like it more, you will want to move more of your functionality into it, and control more options from it. Prepare for this realization and growth by adopting the Python module-based integration model, and its consequence, extending, as soon as you can.


Feedback is appreciated.
Glyph Lefkowitz
Last modified: Sat Jun 14 00:48:11 CDT 2003