Skip to content

Migration Coordination Design

How the system coordinates schema migrations with concurrent data operations across all endpoints, and how the frontend should respond to each migration state.

Problem Statement

When an admin triggers a schema migration (column type change, add NOT NULL, etc.), PostgreSQL acquires locks on the affected table. Depending on the operation, this can block all concurrent reads and writes — affecting:

  • Data endpoints (/api/data/{view}/rows, cell edits, bulk operations)
  • Workspace API (auto-generated REST API used by external consumers)
  • MCP API (AI assistant tool calls)
  • Other SSE-connected users viewing the same or related tables
  • FK-related tables that cascade into the migrating table

Currently, none of these systems know a migration is happening. They hit lock waits, timeout, and return 500s.


PostgreSQL Lock Levels per Operation

Every ALTER TABLE acquires a lock. The lock type determines what concurrent operations are blocked.

ALTER TABLE OperationLock LevelTable Rewrite?DurationBlocks SELECT?Blocks INSERT/UPDATE/DELETE?
ADD COLUMN (nullable, no default)AccessExclusiveNoInstantBrieflyBriefly
ADD COLUMN ... DEFAULT x (PG 11+)AccessExclusiveNoInstantBrieflyBriefly
ADD COLUMN ... DEFAULT x (PG < 11)AccessExclusiveYesProportional to rowsYesYes
DROP COLUMNAccessExclusiveNoInstantBrieflyBriefly
ALTER COLUMN ... TYPE (type change)AccessExclusiveYesProportional to rowsYesYes
ALTER COLUMN ... SET NOT NULLAccessExclusiveNo (scan only)Proportional to rowsYesYes
ALTER COLUMN ... DROP NOT NULLAccessExclusiveNoInstantBrieflyBriefly
ALTER COLUMN ... SET DEFAULTAccessExclusiveNoInstantBrieflyBriefly
ALTER COLUMN ... DROP DEFAULTAccessExclusiveNoInstantBrieflyBriefly
ADD CONSTRAINT ... UNIQUEShareLockNo (index build)Proportional to rowsNoYes
ADD CONSTRAINT ... FOREIGN KEYShareRowExclusiveNoFast (validates rows)NoYes on both tables
DROP CONSTRAINTAccessExclusiveNoInstantBrieflyBriefly
RENAME COLUMNAccessExclusiveNoInstantBrieflyBriefly

Key Insight

"AccessExclusive" sounds scary, but for instant operations the lock is held for microseconds — effectively invisible. The danger is table rewrites and full table scans where the lock is held for seconds to minutes.

MySQL Lock Levels per Operation

MySQL's InnoDB uses Online DDL (since 5.6/8.0) with three algorithms: INSTANT, INPLACE, and COPY. The algorithm determines lock behavior.

ALTER TABLE OperationAlgorithmPermits Concurrent DML?Table Rebuild?DurationBlocks SELECT?Blocks INSERT/UPDATE/DELETE?
ADD COLUMN (last position, 8.0.12+)INSTANTYesNoInstantNoNo
ADD COLUMN (any position, 8.0.29+)INSTANTYesNoInstantNoNo
ADD COLUMN (older MySQL)INPLACEYesYesProportional to rowsNoBrief (start/end only)
DROP COLUMN (8.0.29+)INSTANTYesNoInstantNoNo
DROP COLUMN (older MySQL)INPLACEYesYesProportional to rowsNoBrief
MODIFY COLUMN ... TYPE (type change)COPYNoYesProportional to rowsNoYes
MODIFY COLUMN ... NOT NULLINPLACEYesYesProportional to rowsNoBrief
MODIFY COLUMN ... NULLINPLACEYesYesProportional to rowsNoBrief
ALTER COLUMN ... SET DEFAULTINSTANTYesNoInstantNoNo
ALTER COLUMN ... DROP DEFAULTINSTANTYesNoInstantNoNo
ADD INDEX / UNIQUEINPLACEYesNoProportional to rowsNoBrief (start/end only)
ADD FOREIGN KEYINPLACEYesNoFast (validates rows)NoBrief
DROP FOREIGN KEYINPLACEYesNoInstantNoNo
DROP INDEXINPLACEYesNoInstantNoNo
RENAME COLUMNINSTANTYesNoInstantNoNo

Key Differences from PostgreSQL

AspectPostgreSQLMySQL (InnoDB)
ADD COLUMN (nullable)AccessExclusive (brief)INSTANT (no lock at all, 8.0.12+)
DROP COLUMNAccessExclusive (brief)INSTANT (8.0.29+) or INPLACE (rebuild)
ALTER COLUMN TYPEFull table rewrite, blocks everythingCOPY algorithm, blocks DML but not SELECT
SET NOT NULLFull table scan, blocks everythingINPLACE rebuild, allows concurrent reads AND writes
ADD UNIQUE INDEXShareLock (blocks writes)INPLACE, allows concurrent DML
ADD FOREIGN KEYShareRowExclusive (blocks writes on both tables)INPLACE, brief lock only
Concurrent reads during rewriteBlockedAllowed (except COPY algorithm)

Critical Implication

The same migration can have completely different impact depending on the database vendor:

  • SET NOT NULL: PostgreSQL blocks all reads/writes during scan. MySQL allows full concurrent DML.
  • ALTER TYPE: PostgreSQL blocks everything. MySQL blocks DML but still allows SELECT.
  • ADD COLUMN: PostgreSQL takes AccessExclusive briefly. MySQL is truly zero-lock (INSTANT).

The impact classification must be vendor-aware. The system already knows the workspace database vendor — use it.

MySQL Version Detection

MySQL's INSTANT algorithm availability depends on version:

FeatureMySQL Version Required
INSTANT add column (last position)8.0.12+
INSTANT add column (any position)8.0.29+
INSTANT drop column8.0.29+
INSTANT rename column8.0
Online DDL (INPLACE)5.6+

The processor should query SELECT VERSION() and cache the result per workspace to determine available algorithms.


Migration Impact Classification

Each migration should be classified at detection time (before queuing) into one of these impact levels. Classification is vendor-specific.

Level 1: Transparent

Lock duration: < 100ms (instant metadata-only operations) Affected operations: None in practice Frontend state: No change needed — migration completes before anyone notices Data endpoints: Normal operation

PostgreSQL operations:

  • Add nullable column (no default)
  • Drop column
  • Drop NOT NULL
  • Set/drop default value
  • Rename column

MySQL operations (all of the above, plus):

  • Add column with DEFAULT (INSTANT, 8.0.12+)
  • Drop column (INSTANT, 8.0.29+)
  • Set NOT NULL (INPLACE — allows concurrent DML, brief metadata lock only)

Level 2: Brief Lock

Lock duration: 100ms – 5s (index builds, FK validation on small-medium tables) Affected operations: Writes may queue briefly Frontend state: Show subtle migration indicator, don't lock the UI Data endpoints: May see brief latency spike, no errors expected

PostgreSQL operations:

  • Add UNIQUE constraint (ShareLock — blocks writes during index build)
  • Add FOREIGN KEY constraint (ShareRowExclusive — blocks writes on both tables)
  • Add nullable column with DEFAULT (PG 11+, instant but AccessExclusive)

MySQL operations:

  • Add UNIQUE index (INPLACE — brief metadata lock at start/end, concurrent DML allowed during build)
  • Add FOREIGN KEY (INPLACE — brief metadata lock, concurrent DML allowed)
  • Add column (INPLACE, pre-8.0.12 — table rebuild but concurrent DML allowed)
  • Drop column (INPLACE, pre-8.0.29 — table rebuild but concurrent DML allowed)

Level 3: Blocking

Lock duration: 5s – minutes (table rewrite or full scan on large tables) Frontend state: Lock affected view(s), show migration progress, disable editing Data endpoints: Return 503 Service Unavailable with Retry-After header

PostgreSQL — blocks ALL reads and writes:

  • ALTER COLUMN TYPE (type change) — full table rewrite
  • SET NOT NULL — full table scan to validate no NULLs exist
  • Add column with DEFAULT on PG < 11

MySQL — blocks DML but allows SELECT (COPY algorithm):

  • MODIFY COLUMN TYPE (type change) — full table copy, reads still work
  • Converting between incompatible charsets

Vendor-Specific Frontend Behavior for Level 3

Because MySQL's COPY algorithm still allows SELECT:

StatePostgreSQLMySQL
SELECT / scrollingBlocked — show overlayAllowed — read-only mode
INSERT / UPDATE / DELETEBlockedBlocked
Frontend stateFull migration overlayRead-only banner + disable editing

This means the blocksReads field in the impact analysis is critical — it's true for PostgreSQL type changes but false for MySQL type changes.

MySQL Storage Engine Considerations

The lock behavior above assumes InnoDB (the default since MySQL 5.5). Other storage engines have fundamentally different DDL behavior.

MyISAM

MyISAM has no Online DDL support — every ALTER TABLE performs a full table copy with a table-level lock that blocks all reads AND writes for the entire duration. There is no INSTANT or INPLACE algorithm.

OperationMyISAM BehaviorInnoDB Equivalent
Any ALTER TABLEFull copy + table lockVaries (INSTANT/INPLACE/COPY)
Concurrent SELECTBlockedUsually allowed
Concurrent INSERT/UPDATEBlockedDepends on algorithm

MEMORY, ARCHIVE, CSV, etc.

These engines also lack Online DDL. All ALTER TABLE operations are full-copy with table locks.

What the System Must Do

  1. Detect the storage engine per table: SELECT ENGINE FROM information_schema.TABLES WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?
  2. If not InnoDB: Classify ALL migrations as Level 3 (BLOCKING, blocks reads + writes) regardless of operation type
  3. Recommend InnoDB: Show a warning in the admin UI if non-InnoDB tables are detected — "This table uses MyISAM, which blocks all operations during schema changes. Consider converting to InnoDB for better concurrent access."
  4. Store engine per view in metadata: Cache the storage engine in View or ColumnMetadata so impact analysis doesn't need a live DB query every time

Engine Detection Query

sql
-- Per table
SELECT TABLE_NAME, ENGINE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'workspace_db'
AND TABLE_NAME = 'products';

-- All tables in workspace
SELECT TABLE_NAME, ENGINE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'workspace_db'
AND TABLE_TYPE = 'BASE TABLE';

This should be cached during schema sync and stored in workspace metadata. Re-checked on each sync or drift detection.


Impact Analysis: What the System Must Know

When a migration is detected (in ViewColumnService.updateWithSchemaDetection()), the system must compute a MigrationImpact before queuing:

MigrationImpact {
    // Classification
    level: TRANSPARENT | BRIEF | BLOCKING
    estimatedDurationMs: long

    // Vendor context
    databaseVendor: POSTGRESQL | MYSQL
    mysqlVersion: string?             // e.g. "8.0.35" — affects INSTANT availability
    storageEngine: string?            // e.g. "InnoDB", "MyISAM" — MySQL only

    // Affected scope
    primaryTable: string              // The table being altered
    primaryViewUuids: UUID[]          // All views backed by this table

    // FK cascade analysis
    affectedFkTables: FkImpact[]      // Tables with FK references to/from primaryTable
    totalAffectedTables: int

    // Lock analysis
    lockType: string                  // AccessExclusive, ShareLock, ShareRowExclusive
    blocksReads: boolean
    blocksWrites: boolean
    requiresTableRewrite: boolean

    // Row count context
    estimatedRowCount: long           // From pg_stat_user_tables.n_live_tup

    // Per-operation breakdown (a single PATCH can trigger multiple changes)
    operations: MigrationOperation[]
}

MigrationOperation {
    description: string               // "Change type VARCHAR → INTEGER"
    sqlPreview: string                 // "ALTER TABLE users ALTER COLUMN age TYPE INTEGER USING age::INTEGER"
    lockType: string
    estimatedDurationMs: long
    requiresTableRewrite: boolean
}

FkImpact {
    table: string
    viewUuids: UUID[]
    fkConstraint: string              // Constraint name
    cascadeType: string               // RESTRICT, CASCADE, SET NULL, NO ACTION
    direction: REFERENCES | REFERENCED_BY
    impactOnMigration: string         // "FK validation required" / "CASCADE may propagate lock"
    blocksReads: boolean
    blocksWrites: boolean
}

Row Count Thresholds

Estimated duration depends on row count. These thresholds determine the impact level for table-rewrite operations:

Row CountType Change Duration (est.)Impact Level
< 10,000< 1sBrief
10,000 – 100,0001s – 10sBlocking
100,000 – 1,000,00010s – 60sBlocking (warn)
> 1,000,000> 60sBlocking (require confirmation)

Row counts come from pg_stat_user_tables.n_live_tup (approximate but fast — no table scan needed).


FK Cascade Analysis

Foreign keys create cross-table dependencies that extend a migration's blast radius.

Scenario: Migration on a Referenced Table

Table categories is referenced by products.category_id (FK).

Adding NOT NULL to categories.name:

  • Lock on categories only
  • products is unaffected (FK points FROM products TO categories)
  • But: INSERT/UPDATE on products that references categories will block if the category table is locked

Dropping categories.id (referenced column):

  • Depends on FK cascade rule:
    • RESTRICT / NO ACTION: Migration fails immediately (FK violation)
    • CASCADE: Drops the FK constraint on products too — both tables affected
    • SET NULL: Sets products.category_id to NULL for affected rows — writes to products

Scenario: Migration on a Referencing Table

Table products has FK category_id → categories.id.

Changing products.category_id type:

  • Must drop FK first, alter column, recreate FK
  • Lock on products (table rewrite for type change)
  • Brief lock on categories (FK recreation validates references)
  • Both tables affected

Scenario: Cascade Chain

order_items → orders → customers

Migration on customers.id type with CASCADE:

  • customers: table rewrite (type change)
  • orders: FK must be dropped and recreated (validates all rows)
  • order_items: FK must be dropped and recreated (validates all rows)
  • All three tables locked at various points

What the System Must Compute

For every migration, walk the FK graph in both directions:

  1. Outgoing FKs (this table references others): Will the migration break or need to drop/recreate these?
  2. Incoming FKs (other tables reference this one): Will the migration cascade? What's the cascade rule?
  3. Transitive FKs (chains): If cascade propagates, follow the chain
  4. For each affected table: Which views are backed by it? How many rows?

Endpoint Behavior During Migrations

Migration State per Table

The metadata DB tracks active migrations per table:

active_migration {
    workspace_id: long
    table_name: string
    column_uuid: UUID
    impact_level: TRANSPARENT | BRIEF | BLOCKING
    blocks_reads: boolean
    blocks_writes: boolean
    affected_tables: string[]         // Including FK cascade targets
    started_at: timestamp
    estimated_completion: timestamp
    operation_id: string              // Links to TaskCompletionMessage
}

Data Endpoints (/api/data/...)

Migration LevelSELECT (read)INSERTUPDATEDELETEBulk Operations
No migrationNormalNormalNormalNormalNormal
TRANSPARENTNormalNormalNormalNormalNormal
BRIEFNormalNormal (may queue 1-5s)Normal (may queue)Normal (may queue)Normal
BLOCKING + blocks_reads503 + Retry-After503503503503
BLOCKING + writes onlyNormal503503503503

Response for 503:

json
{
    "error": "schema_migration_in_progress",
    "message": "A schema migration is in progress on this table. Please retry shortly.",
    "table": "products",
    "operationId": "abc-123",
    "estimatedCompletionAt": "2026-03-16T20:35:00Z",
    "retryAfterSeconds": 15,
    "affectedOperations": ["INSERT", "UPDATE", "DELETE"]
}

Workspace API (auto-generated REST)

Same behavior as data endpoints — the workspace API hits the same tables. External consumers (Zapier, n8n, etc.) must handle 503 + Retry-After gracefully.

MCP API

Same behavior. AI assistants calling MCP tools should see:

  • Tool call result: "error": "schema_migration_in_progress" with context
  • The AI can inform the user and retry after the estimated completion time

Implementation: Request Interceptors

Rather than adding migration checks to every endpoint, use framework-level interceptors that check the active_migration table before the request reaches the endpoint.

Quarkus (Data Endpoints) — JAX-RS ContainerRequestFilter

java
@Provider
@Priority(Priorities.AUTHORIZATION + 10)  // Run after auth, before endpoint
public class MigrationGuardFilter implements ContainerRequestFilter {

    @Inject
    ActiveMigrationRepository activeMigrationRepository;

    @Override
    public void filter(ContainerRequestContext ctx) throws IOException {
        // Extract table/view from request path (e.g. /api/data/{viewUuid}/rows)
        String viewUuid = extractViewUuid(ctx);
        if (viewUuid == null) return;  // Not a data endpoint

        String method = ctx.getMethod();
        boolean isRead = "GET".equals(method) || "HEAD".equals(method);

        ActiveMigration migration = activeMigrationRepository
            .findBlockingByViewUuid(viewUuid);
        if (migration == null) return;  // No active migration

        // Check if this operation type is blocked
        if (isRead && !migration.isBlocksReads()) return;  // Reads allowed
        if (!isRead && !migration.isBlocksWrites()) return; // Writes allowed

        // Block the request
        ctx.abortWith(Response.status(503)
            .header("Retry-After", migration.getRetryAfterSeconds())
            .entity(migration.toErrorResponse())
            .build());
    }
}

Covers: /api/data/... endpoints (cell edits, row inserts, bulk operations, selects)

Spring Boot (Workspace API + MCP API) — HandlerInterceptor

java
@Component
public class MigrationGuardInterceptor implements HandlerInterceptor {

    private final ActiveMigrationService activeMigrationService;

    @Override
    public boolean preHandle(HttpServletRequest request,
                             HttpServletResponse response,
                             Object handler) throws Exception {
        // Extract workspace + table from request path
        String tableName = extractTableName(request);
        Long workspaceId = extractWorkspaceId(request);
        if (tableName == null || workspaceId == null) return true;

        boolean isRead = "GET".equals(request.getMethod());

        ActiveMigration migration = activeMigrationService
            .findBlocking(workspaceId, tableName);
        if (migration == null) return true;

        if (isRead && !migration.isBlocksReads()) return true;

        response.setStatus(503);
        response.setHeader("Retry-After",
            String.valueOf(migration.getRetryAfterSeconds()));
        response.setContentType("application/json");
        response.getWriter().write(migration.toErrorJson());
        return false;  // Short-circuit — don't invoke endpoint
    }
}

Covers: Workspace REST API (/api/workspace/{slug}/...) and MCP tool calls

Why Interceptors, Not Per-Endpoint Checks

  • Single point of enforcement — every current and future data endpoint is protected automatically
  • No code duplication — migration check logic lives in one place
  • Cannot be forgotten — new endpoints get migration protection by default
  • Fast pathactive_migration lookup is a simple indexed query, cached in-memory for the duration of a migration
  • Clean separation — endpoints don't need to know about migrations at all

Caching the Active Migration State

The active_migration table will be queried on every data request during a migration. To avoid per-request DB hits:

  • Cache active migrations in-memory per workspace (small dataset — typically 0-1 active migrations)
  • Invalidate on TaskCompletionMessage (migration success/failure)
  • TTL fallback: expire cache after 5 minutes to handle missed completion messages
  • On Quarkus side: use CDI @ApplicationScoped bean with ConcurrentHashMap
  • On Spring Boot side: use ConcurrentHashMap in the interceptor service

Frontend State Machine

Each view in the frontend should track migration state and adjust the UI accordingly.

States

NORMAL

    ├── view.column.schema.changing (TRANSPARENT)
    │       → No visible change, migration completes instantly
    │       → Transition to NORMAL on view.column.schema.changed

    ├── view.column.schema.changing (BRIEF)
    │       → Show subtle indicator on affected column header
    │       → Disable editing for that column only
    │       → Other columns remain editable
    │       → Transition to NORMAL on view.column.schema.changed

    ├── view.column.schema.changing (BLOCKING, writes only)
    │       → Show migration banner: "Schema migration in progress — editing disabled"
    │       → Disable all INSERT/UPDATE/DELETE on affected view(s)
    │       → SELECT/scrolling still works
    │       → Show progress indicator with estimated time
    │       → Transition to NORMAL on view.column.schema.changed

    └── view.column.schema.changing (BLOCKING, reads + writes)
            → Show full migration overlay: "Schema migration in progress — please wait"
            → Disable all operations on affected view(s)
            → Show progress indicator with estimated time
            → Transition to NORMAL on view.column.schema.changed

Multi-View Impact

When FK cascade analysis shows multiple tables affected:

  1. The SSE event includes affectedViewUuids[]
  2. Frontend locks ALL listed views, not just the one where the migration was triggered
  3. Each affected view shows a banner: "Related table {tableName} is being migrated"

What the SSE Event Must Carry

The view.column.schema.changing event needs to be extended:

typescript
interface SchemaChangingEvent {
    // Existing fields
    columnUuid: string;
    columnName: string;
    viewUuid: string;
    reason: string;

    // New: impact analysis
    impactLevel: 'TRANSPARENT' | 'BRIEF' | 'BLOCKING';
    blocksReads: boolean;
    blocksWrites: boolean;
    estimatedDurationMs: number;
    estimatedCompletionAt: string;          // ISO 8601

    // New: cross-table impact
    affectedViewUuids: string[];            // All views that should show migration state
    affectedTables: string[];               // For admin visibility

    // New: operation detail
    operations: {
        description: string;
        requiresTableRewrite: boolean;
    }[];
}

Echo Filtering for Migration Events

Migration lock events should NOT be echo-filtered. The originating client also needs to show the migration state — they triggered it, and their view should reflect the lock too. Only the view.column.updated event (the metadata change) should be echo-filtered.


Pre-Migration Confirmation Flow

For BLOCKING migrations on tables with significant row counts, the system should require explicit confirmation.

Flow

Admin changes column type VARCHAR → INTEGER


Backend computes MigrationImpact

    ├── Level: BLOCKING
    ├── Estimated duration: 45s
    ├── Blocks: reads + writes
    ├── Row count: 500,000
    ├── FK impact: 2 related tables (orders, order_items)
    ├── Total affected views: 3


Response: 200 OK (dry-run)
{
    "dryRun": true,
    "impact": { ...full MigrationImpact... },
    "confirmationRequired": true,
    "message": "This migration will rewrite 500,000 rows and block 3 views for ~45 seconds. 2 related tables will also be affected.",
    "confirmUrl": "/api/columns/item/{uuid}/confirm-migration",
    "confirmToken": "one-time-token"
}


Frontend shows confirmation dialog with full impact details


Admin confirms


POST /api/columns/item/{uuid}/confirm-migration
{ "confirmToken": "one-time-token" }


Backend queues the migration (202 Accepted)

When to Require Confirmation

ConditionRequire Confirmation?
TRANSPARENT migrationNo
BRIEF migration, < 50k rowsNo
BRIEF migration, > 50k rowsOptional (show warning, don't block)
BLOCKING migration, < 10k rowsNo (fast enough)
BLOCKING migration, 10k–100k rowsYes
BLOCKING migration, > 100k rowsYes + strong warning
Any migration with FK cascade to large tablesYes

Concurrent Migration Prevention

Only one BLOCKING migration should run per table at a time.

Check Before Accepting

java
// In ViewColumnService.updateWithSchemaDetection()
if (impact.level == BLOCKING) {
    ActiveMigration existing = activeMigrationRepository
        .findByTableName(impact.primaryTable);
    if (existing != null) {
        throw new ConflictException(
            "A migration is already in progress on table '" +
            impact.primaryTable + "'. Please wait for it to complete."
        );
        // Returns 409 Conflict
    }
}

TRANSPARENT and BRIEF migrations: Allow concurrent

These are fast enough that serialization isn't worth the complexity. PostgreSQL's own lock management handles it.

BLOCKING migrations: One at a time per table

Enforced by the active_migration table with a unique constraint on (workspace_id, table_name) where completed_at IS NULL.


Implementation Phases

Phase 1: Vendor & Engine Detection (foundation)

  • Detect database vendor (PostgreSQL vs MySQL) — already available in workspace config
  • Query MySQL version (SELECT VERSION()) and cache per workspace
  • Query storage engine per table (information_schema.TABLES) during schema sync
  • Store vendor, version, and storage engine in workspace/view metadata
  • Warn admins if non-InnoDB tables are detected

Phase 2: Migration Impact Analysis

  • Build vendor-aware operation classifier (same operation → different impact per vendor + engine) ✅ Done — MigrationImpactAnalyzer with PostgreSQL, MySQL InnoDB, and non-InnoDB classifiers
  • Compute MigrationImpact from detected schema changes ✅ Done — MigrationImpactDTO with ImpactLevel (TRANSPARENT/BRIEF/BLOCKING), per-operation breakdown
  • Return impact data in ColumnUpdateResultDTO for 202 responses ✅ Done — wired into ViewColumnService.updateWithSchemaDetection()
  • Extend view.column.schema.changing SSE event with impact fields ✅ Done — impactLevel, blocksReads, blocksWrites, estimatedDurationMs
  • Add row count estimation: pg_stat_user_tables.n_live_tup (PostgreSQL) or information_schema.TABLES.TABLE_ROWS (MySQL) ✅ Done — TableStatsService queries workspace DB via JDBC on worker thread
  • Query MySQL version (SELECT VERSION()) and cache per workspace ✅ Done — TableStatsService.getMysqlVersion()
  • Query storage engine per table (information_schema.TABLES) ✅ Done — TableStatsService.getMysqlStorageEngine()

Phase 3: Data Endpoint Protection

  • Add active_migration table to metadata DB (Liquibase changeset) ✅ Done — changeset metadata-1.50
  • Write migration state on queue (before sending to processor) ✅ Done — ActiveMigrationService.startMigration() called from ViewColumnService
  • Clear migration state on TaskCompletionMessage (success or failure) ✅ Done — completeMigrationByColumnUuid() called from ColumnOperationHandler
  • Implement MigrationGuardFilter (JAX-RS ContainerRequestFilter) for Quarkus data endpoints ✅ Done — intercepts /api/data/ paths, checks viewUuid against cache
  • Implement MigrationGuardInterceptor (Spring HandlerInterceptor) for workspace API ✅ Done — intercepts /api/v1/**, extracts table name from URL, checks via ActiveMigrationJdbcService
  • Add MCP migration guard ✅ Done — McpAccessGuard.requireNoActiveMigration() protects all DataTools (query, get, create, update)
  • Add in-memory cache for active migration state with TTL fallback ✅ Done — Quarkus: ConcurrentHashMap with 5-min TTL; Spring Boot: Caffeine cache with 5-sec TTL
  • Return 503 + Retry-After with structured error response ✅ Done — SCHEMA_MIGRATION_IN_PROGRESS error code with retryAfterSeconds, blocksReads, blocksWrites

Phase 4: FK Cascade Analysis

  • Build FK graph from ColumnMetadata relationships and information_schema
  • Walk the graph in both directions for each migration to find all affected tables
  • Account for cascade rules (RESTRICT, CASCADE, SET NULL, NO ACTION)
  • Include affected tables in impact analysis and SSE events
  • Extend interceptor to check FK-affected tables, not just the primary table

Phase 5: Pre-Migration Confirmation

  • Add dry-run mode to the PATCH endpoint
  • Generate confirmation tokens with TTL
  • Add POST .../confirm-migration endpoint
  • Frontend confirmation dialog with full impact visualization (affected views, row counts, estimated duration, FK cascade)

Phase 6: Duration Estimation and Progress

  • Track actual migration durations to improve estimates over time ✅ Done — MigrationDurationTracker records estimated vs actual duration in migration_duration_history table
  • Benchmark table rewrite speed per row ✅ Done — MigrationDurationTracker.getRefinedMsPerRow() computes average ms/row from historical data, falling back to static defaults
  • Add progress reporting from processor ✅ Done — MigrationProgressPoller polls pg_stat_progress_alter_table (PG 12+) every 2s via second JDBC connection; falls back to time-based estimates for MySQL/older PG
  • Real-time progress in SSE events ✅ Done — MigrationProgressMessageMigrationProgressConsumerview.column.schema.progress SSE event with percent, phase, elapsed, rows processed
  • MySQL: no equivalent progress view ✅ Done — time-based progress estimation used automatically when pg_stat_progress_alter_table is unavailable
  • Wire getRefinedMsPerRow() into MigrationImpactAnalyzer ✅ Done — MigrationImpactAnalyzer.analyze() now accepts refined rates; ViewColumnService pre-fetches workspace-specific rates via MigrationDurationTracker

Open Questions

  1. Should TRANSPARENT migrations still send schema.changing events? They complete instantly — the event might arrive after the schema.changed event. Consider skipping the lock event for Level 1.
  2. Timeout handling: What if a BLOCKING migration takes longer than estimated? Should the system extend the Retry-After, or should there be a hard timeout with auto-rollback?
  3. Processor health: What if the processor crashes mid-migration? The active_migration record stays forever. Need a TTL or heartbeat mechanism.
  4. Read replicas: If the workspace database has read replicas, SELECTs could be routed there during migrations. Out of scope for now but worth noting.
  5. Online schema change tools: For very large tables (> 10M rows), tools like pg_repack or pgroll (PostgreSQL) or gh-ost / pt-online-schema-change (MySQL) can do non-blocking schema changes. Worth evaluating for the ZFS-hosted tier.
  6. MyISAM conversion: Should the system offer a one-click "Convert to InnoDB" for MyISAM tables? This is itself a blocking migration, but a one-time improvement.
  7. MySQL ALGORITHM hint: Should the processor explicitly request ALGORITHM=INSTANT or ALGORITHM=INPLACE when available, to fail fast if MySQL would fall back to COPY? This prevents silent performance degradation.

SchemaStack Internal Developer Documentation