[PATCH 4/4] RFC: migrations: 0043: Perform ALTER TABLE in one big batch

Daniel Axtens dja at axtens.net
Tue Aug 31 23:55:27 AEST 2021


Daniel Axtens <dja at axtens.net> writes:

> This saves us doing a whole bunch of rewrites sequentially - instead
> we just do one big rewrite.
>
> Currently breaks postgres and sqlite, which is annoying because they
> don't need this to go faster. OTOH it makes mysql a **lot** faster - from
> around 220s to ~45s, and that will scale with table size.
>
> Mostly, however, this is RFC because I am not sure about the random-ish
> strings in the middle of the constraint names. I took them from the
> `sqlmigrate` output but I don't know if they're random or special.

This is a doozy.

The random strings are created by names_digest in django. It's the first
8 hex digits of an md5 digest. The inputs to the md5 digest are a bit
fiddly to determine.

The name of the constraint is created by _fk_constraint_name in
django.db.backends.base.schema.BaseDatabaseSchemaEditor.
(AFAICT it is not overloaded.)

_fk_constraint_name is this:

    def _fk_constraint_name(self, model, field, suffix):
        def create_fk_name(*args, **kwargs):
            return self.quote_name(self._create_index_name(*args, **kwargs))

        return ForeignKeyName(
            model._meta.db_table,
            [field.column],
            split_identifier(field.target_field.model._meta.db_table)[1],
            [field.target_field.column],
            suffix,
            create_fk_name,
        )

Ignoring ForeignKeyName for now, we can infer that create_fk_name will
get called and will call _create_index_name. That does this:

    def _create_index_name(self, table_name, column_names, suffix=""):
        """
        Generate a unique name for an index/unique constraint.

        The name is divided into 3 parts: the table name, the column names,
        and a unique digest and suffix.
        """
        _, table_name = split_identifier(table_name)
        hash_suffix_part = '%s%s' % (names_digest(table_name, *column_names, length=8), suffix)
        max_length = self.connection.ops.max_name_length() or 200
        # If everything fits into max_length, use that name.
        index_name = '%s_%s_%s' % (table_name, '_'.join(column_names), hash_suffix_part)
        if len(index_name) <= max_length:
            return index_name
        # Shorten a long suffix.
        if len(hash_suffix_part) > max_length / 3:
            hash_suffix_part = hash_suffix_part[:max_length // 3]
        other_length = (max_length - len(hash_suffix_part)) // 2 - 1
        index_name = '%s_%s_%s' % (
            table_name[:other_length],
            '_'.join(column_names)[:other_length],
            hash_suffix_part,
        )
        # Prepend D if needed to prevent the name from starting with an
        # underscore or a number (not permitted on Oracle).
        if index_name[0] == "_" or index_name[0].isdigit():
            index_name = "D%s" % index_name[:-1]
        return index_name

So we call names_digest() on the table_name and the column_names.

So I feel fairly safe to say that it's going to be static and based on
the md5 of the table and related columns. I'll try to patch my local
django install and see what's getting passed to names_digest to confirm,
but we're _probably_ right to just use the names based on the SQL from
sqlmigrate.

Anyway I am trying to not spend all my nights coding so I will leave it
here for now and return to it later.

Kind regards,
Daniel

>
> Thoughts?
>
> Signed-off-by: Daniel Axtens <dja at axtens.net>
> ---
>  .../migrations/0043_merge_patch_submission.py | 195 +++++++++++-------
>  1 file changed, 117 insertions(+), 78 deletions(-)
>
> diff --git a/patchwork/migrations/0043_merge_patch_submission.py b/patchwork/migrations/0043_merge_patch_submission.py
> index 1d072ca18230..e17c394f3ffb 100644
> --- a/patchwork/migrations/0043_merge_patch_submission.py
> +++ b/patchwork/migrations/0043_merge_patch_submission.py
> @@ -4,9 +4,34 @@ import django.db.models.deletion
>  from django.db.models import Max
>  
>  import patchwork.fields
> +import datetime
>  
>  CHUNK_SIZE = 10000
>  
> +def report(apps, schema_editor):
> +    print(datetime.datetime.now())
> +
> +def add_fields(apps, schema_editor):
> +    # mysql - each can be a costly rewrite, bundle.
> +    # pgsql and others don't let us do multiple which is sad.
> +    schema_editor.execute(
> +        """
> +        ALTER TABLE `patchwork_submission` 
> +            ADD COLUMN `archived` bool DEFAULT b'0' NOT NULL,
> +            ADD COLUMN `commit_ref` varchar(255) NULL,
> +            ADD COLUMN `delegate_id` integer NULL , ADD CONSTRAINT `patchwork_submission_delegate_id_4b8639b8_fk_auth_user_id` FOREIGN KEY (`delegate_id`) REFERENCES `auth_user`(`id`),
> +            ADD COLUMN `diff` longtext NULL,
> +            ADD COLUMN `hash` char(40) NULL,
> +            ADD COLUMN `number` smallint UNSIGNED NULL,
> +            ADD COLUMN `pull_url` varchar(255) NULL,
> +            ADD COLUMN `related_id` integer NULL , ADD CONSTRAINT `patchwork_submission_related_id_2e3e66e8_fk_patchwork` FOREIGN KEY (`related_id`) REFERENCES `patchwork_patchrelation`(`id`),
> +            ADD COLUMN `series_id` integer NULL , ADD CONSTRAINT `patchwork_submission_series_id_61c13d16_fk_patchwork_series_id` FOREIGN KEY (`series_id`) REFERENCES `patchwork_series`(`id`),
> +            ADD COLUMN `state_id` integer NULL , ADD CONSTRAINT `patchwork_submission_state_id_947655b7_fk_patchwork_state_id` FOREIGN KEY (`state_id`) REFERENCES `patchwork_state`(`id`);
> +        ALTER TABLE `patchwork_submission`
> +            ALTER COLUMN `archived` DROP DEFAULT;
> +        """
> +    )
> +
>  def migrate_data(apps, schema_editor):
>      Patch = apps.get_model('patchwork', 'patch')
>      max_id = Patch.objects.all().aggregate(Max('submission_ptr_id'))['submission_ptr_id__max']
> @@ -101,6 +126,7 @@ class Migration(migrations.Migration):
>  
>      operations = [
>          # move the 'PatchTag' model to point to 'Submission'
> +        migrations.RunPython(report, None, atomic=False),
>  
>          migrations.RemoveField(model_name='patch', name='tags',),
>          migrations.AddField(
> @@ -118,6 +144,7 @@ class Migration(migrations.Migration):
>                  to='patchwork.Submission',
>              ),
>          ),
> +        migrations.RunPython(report, None, atomic=False),
>  
>          # do the same for any other field that references 'Patch'
>  
> @@ -166,6 +193,7 @@ class Migration(migrations.Migration):
>                  to='patchwork.Submission',
>              ),
>          ),
> +        migrations.RunPython(report, None, atomic=False),
>  
>          # rename all the fields on 'Patch' so we don't have duplicates when we
>          # add them to 'Submission'
> @@ -209,94 +237,104 @@ class Migration(migrations.Migration):
>                  ),
>              ],
>          ),
> +        migrations.RunPython(report, None, atomic=False),
>  
>          # add the fields found on 'Patch' to 'Submission'
>  
> -        migrations.AddField(
> -            model_name='submission',
> -            name='archived',
> -            field=models.BooleanField(default=False),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='commit_ref',
> -            field=models.CharField(blank=True, max_length=255, null=True),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='delegate',
> -            field=models.ForeignKey(
> -                blank=True,
> -                null=True,
> -                on_delete=django.db.models.deletion.CASCADE,
> -                to=settings.AUTH_USER_MODEL,
> -            ),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='diff',
> -            field=models.TextField(blank=True, null=True),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='hash',
> -            field=patchwork.fields.HashField(
> -                blank=True, max_length=40, null=True
> -            ),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='number',
> -            field=models.PositiveSmallIntegerField(
> -                default=None,
> -                help_text='The number assigned to this patch in the series',
> -                null=True,
> -            ),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='pull_url',
> -            field=models.CharField(blank=True, max_length=255, null=True),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='related',
> -            field=models.ForeignKey(
> -                blank=True,
> -                null=True,
> -                on_delete=django.db.models.deletion.SET_NULL,
> -                related_name='patches',
> -                related_query_name='patch',
> -                to='patchwork.PatchRelation',
> -            ),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='series',
> -            field=models.ForeignKey(
> -                blank=True,
> -                null=True,
> -                on_delete=django.db.models.deletion.CASCADE,
> -                related_name='patches',
> -                related_query_name='patch',
> -                to='patchwork.Series',
> -            ),
> -        ),
> -        migrations.AddField(
> -            model_name='submission',
> -            name='state',
> -            field=models.ForeignKey(
> -                null=True,
> -                on_delete=django.db.models.deletion.CASCADE,
> -                to='patchwork.State',
> -            ),
> +        migrations.SeparateDatabaseAndState(
> +            state_operations=[
> +                    migrations.AddField(
> +                    model_name='submission',
> +                    name='archived',
> +                    field=models.BooleanField(default=False),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='commit_ref',
> +                    field=models.CharField(blank=True, max_length=255, null=True),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='delegate',
> +                    field=models.ForeignKey(
> +                        blank=True,
> +                        null=True,
> +                        on_delete=django.db.models.deletion.CASCADE,
> +                        to=settings.AUTH_USER_MODEL,
> +                    ),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='diff',
> +                    field=models.TextField(blank=True, null=True),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='hash',
> +                    field=patchwork.fields.HashField(
> +                        blank=True, max_length=40, null=True
> +                    ),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='number',
> +                    field=models.PositiveSmallIntegerField(
> +                        default=None,
> +                        help_text='The number assigned to this patch in the series',
> +                        null=True,
> +                    ),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='pull_url',
> +                    field=models.CharField(blank=True, max_length=255, null=True),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='related',
> +                    field=models.ForeignKey(
> +                        blank=True,
> +                        null=True,
> +                        on_delete=django.db.models.deletion.SET_NULL,
> +                        related_name='patches',
> +                        related_query_name='patch',
> +                        to='patchwork.PatchRelation',
> +                    ),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='series',
> +                    field=models.ForeignKey(
> +                        blank=True,
> +                        null=True,
> +                        on_delete=django.db.models.deletion.CASCADE,
> +                        related_name='patches',
> +                        related_query_name='patch',
> +                        to='patchwork.Series',
> +                    ),
> +                ),
> +                migrations.AddField(
> +                    model_name='submission',
> +                    name='state',
> +                    field=models.ForeignKey(
> +                        null=True,
> +                        on_delete=django.db.models.deletion.CASCADE,
> +                        to='patchwork.State',
> +                    ),
> +                ),
> +            ],
> +            database_operations=[
> +                migrations.RunPython(add_fields, None, atomic=False),
> +            ]
>          ),
>  
>          # copy the data from 'Patch' to 'Submission'
> +        migrations.RunPython(report, None, atomic=False),
>  
>          migrations.RunPython(migrate_data, None, atomic=False),
>  
>          # configure metadata for the 'Submission' model
> +        migrations.RunPython(report, None, atomic=False),
>  
>          migrations.AlterModelOptions(
>              name='submission',
> @@ -330,6 +368,7 @@ class Migration(migrations.Migration):
>          ),
>  
>          # drop the 'Patch' model and rename 'Submission' to 'Patch'
> +        migrations.RunPython(report, None, atomic=False),
>  
>          migrations.DeleteModel(name='Patch',),
>          migrations.RenameModel(old_name='Submission', new_name='Patch',),
> -- 
> 2.30.2


More information about the Patchwork mailing list