web2py: Normalized many-to-many model with multiselect drop-down

This article is from 2014 and has been archived. It's old and probably outdated.

The Python framework web2py is pretty powerful when it comes to automagical form and data handling. A quite common use case in an application is the need to manage a business object that might have many relations to some other business object, e.g. one package might consist of many components. If you use one database table for each object there are at least two approaches to create a many-to-many relation in web2py. ## Approach One (simple): list:reference

Let us have a look at this DAL (Database Abstraction Layer) model:

db.define_table(’component’,
                Field(’a_component_field’),
                format=’%(a_component_field)s’)

db.define_table(’package_one’,
                Field(’a_package_field’),
                Field(’component_ids’, ‘list:reference component’)
                format=’%(a_package_field)s’)

When you open the appadmin database administration section of the application, the form to add a new package should look like this, given there are some components already:

Manage packages form screenshot

You just have to use list:reference component and you will get a multiselect drop-down.

The list:reference is described in the web2py manual as:

The list:reference and the contains(value) operator are particularly useful to de-normalize many-to-many relations. [...]

Notice that a list:reference tag field get a default constraint

requires = IS_IN_DB(db,’tag.id’,db.tag._format,multiple=True)

that produces a SELECT/OPTION multiple drop-box in forms.

Why might that approach be problematic in a relational database?

list:reference stores the referenced IDs in the corresponding field as a string, concatenated by a vertical bar (or pipe). This is a classical SQL antipattern (book recommendation) for various reasons, some of them are:

  • You cannot ensure uniqueness.
  • It violates the first normal form (1NF).
  • It is not possible to delete values without reading all values, parsing them and storing all but the obsolete values again.
  • Counting is difficult.
  • Query performance might be an issue when the table contains lots of these values.
  • A string field of an SQL table has a limited size depending on its data type. If not chosen properly, this will literally cut references.
  • The relation between the objects cannot have specific properties.

Some issues are fortunately treated by the DAL, transparent to the developer, some cannot be solved that way. There are also times where you might want to query the database directly using plain SQL or some other query tool (Adminer, PhpMyAdmin, DB Manager in your IDE) which requires a consistent database structure independent from DAL.

I recommend to carefully think about the entity relationship model and try to estimate the likely amount of data and the type of data you want to store with this approach. web2py also offers other list types like list:string and list:integer that might be used for a small amount of denormalized data.

A clear advantage of this approach: It takes almost no time to implement and is therefore definitely useful for quick prototypes or small applications.

Approach two (failed): Intersection table and IS_IN_DB requirement

The next option naturally comes in mind: Create an intersection table and use the IS_IN_DB validator to create a multiselect drop-down. Unfortunately, this approach fails and causes an exception when the form is processed:

int() argument must be a string or a number, not ‘list’

Model:

db.define_table(’component’,
                Field(’a_component_field’),
                format=’%(a_component_field)s’)

db.define_table(’package’,
                Field(’a_package_field’),
                format=’%(a_package_field)s’)

db.define_table(’component_package_association’,
                Field(’component_id’, ‘reference component’),
                Field(’package_id’, ‘reference package’))

db.component_package_association.component_id.requires = IS_IN_DB(
    db, ‘component.id’, ‘%(a_component_field)s’, multiple=True)

Controller:

def manage_packages():
    if request.args(0) == ‘new’ and request.args(1) == ‘package’:
        form = SQLFORM.factory(db.package, db.component_package_association)

        if form.process().accepted:
            # Code to store the new package and associations to components
            # ...
            response.flash = ‘Package has been created successfully.’

        content = form
    else:
        content = SQLFORM.grid(db.package)

    return dict(content=content)

web2py’s DAL does not know how to validate component_package_association.component_id when a list of IDs is supplied. I could not find a way to get it working.

Approach three (recommended): Intersection table with custom insert handling

This way takes into account the problems mentioned above while retaining the view and behaviour for the user. Caveat: The admin will not have a multiple drop-down in the appadmin database administration section since a modified controller is needed.

Model:

db.define_table(’component’,
                Field(’a_component_field’),
                format=’%(a_component_field)s’)

db.define_table(’package’,
                Field(’a_package_field’),
                format=’%(a_package_field)s’)

db.define_table(’component_package_association’,
                Field(’component_id’, ‘reference component’),
                Field(’package_id’, ‘reference package’))

Controller:

def manage_packages():
    if request.args(0) == ‘new’ and request.args(1) == ‘package’:

        # (1) Get available components
        components = [(r.id, r.a_component_field) for r in db(db.component).select()]

        # (2) Build the form
        form = SQLFORM.factory(
            db.package,
            Field(
                ‘components’,
                requires=IS_IN_SET(components, multiple=True)
            )
        )

        # (3) Validate form data
        if form.process().accepted:

            # (4) Insert package
            package_id = db.package.insert(
                **db.package._filter_fields(form.vars))

            if package_id and form.vars.components:

                # (5) Insert component package associations
                for component_id in form.vars.components:
                    existing_component = db.component(component_id)

                    if existing_component:
                        db.component_package_association.insert(
                            package_id=package_id,
                            component_id=existing_component
                        )

                response.flash = ‘Package has been created successfully.’

        content = form
    else:
        content = SQLFORM.grid(db.package)

    return dict(content=content)

This sample controller can properly add new packages and list packages:

  1. It queries available components and builds a list containing sets with component primary keys and a description field.
  2. It builds a custom field that is treated as a set. The IS_IN_SET validator creates the same multiselect drop-down as in the previous approaches, it is predestinated to be applied to lists containing sets and treats the ids and labels properly.
  3. The form data is validated, the IS_IN_SET is truthy.
  4. The package is inserted and the id is returned
  5. Each component_id is stored together with the package_id as a new association

Please note: The edit function of the SQLFORM.grid will use the default form to edit one element of a table. The code for adding packages should provide an idea of how to implement the same functionality for editing entries. In a real application additional checks and error handling are required.

If you find issues with my approach or know how to improve it further, please drop a comment.

Jan Beilicke

About the author

Long-time IT professional and full-time nerd. Open source enthusiast, advocating security and privacy. Sees the cloud as other people's computers. Find me on Mastodon or Twitter.