5 min read

Partnerships

From unstable pipelines to controlled orchestration: a real-world approach with qibb

Acheron

Scaling media workflows tends to fail in predictable ways. You increase concurrency, push more data through the system, and instead of higher throughput, you get instability, including API limits, database bottlenecks, and pipelines that collapse under load.

‍

We ran into exactly this problem during a large-scale media migration project involving high-resolution video assets and complex metadata processing. The goal was straightforward: move and enrich large volumes of media data efficiently. The reality was anything but.

‍

In this post, we’ll walk through how we approached the problem, and how using qibb as the orchestration layer helped us move from unstable parallel processing to controlled, predictable execution.

The challenge: parallel processing without control

At scale, parallel processing introduces more problems than it solves if left unchecked. In our case, increasing concurrency didn’t improve throughput. It exposed weaknesses across the entire system. APIs began enforcing rate limits, serverless functions were throttled, and the database struggled to keep up with the volume of concurrent operations. What initially looked like a scalable architecture quickly became unreliable in production.

‍

The issue wasn’t a lack of compute. It was a lack of control over how work moved through the system.

Our approach: controlled orchestration with qibb

Instead of pushing for maximum speed, we shifted focus toward controlling execution. qibb acted as the orchestration layer that allowed us to regulate how data entered, moved through, and exited the system.

‍

Rather than treating the pipeline as a single continuous flow, we broke it into smaller, coordinated units of work. This made it possible to manage concurrency more deliberately, distribute processing across independent applications, and ensure that failures in one part of the system didn’t cascade into others.

The result was a system that traded raw speed for something far more valuable in production: predictability.

Architecture overview

The system was designed as a set of coordinated applications rather than a single monolithic pipeline. An initiator component acted as the control layer, reading a limited batch of file references and introducing them into the system in controlled segments.

Once a batch was identified, it was split into two parallel processing streams. Each stream operated independently within qibb, handling its own subset of files from ingestion through transformation to delivery. A synchronization step ensured that both streams completed before the next batch was triggered.

‍

This fan-out and fan-in pattern created a rhythm in execution. Instead of uncontrolled parallelism, the system operated in predictable cycles, balancing throughput with stability. Distributing workloads across separate applications also reduced memory pressure and improved runtime behavior under load.

Where performance broke and how we tuned it

Improving performance wasn’t about scaling everything up. It was about tuning the system at the right points.

‍

Batch size turned out to be one of the most important levers. Smaller batches created unnecessary overhead, while larger ones increased the likelihood of timeouts and made failures more expensive to recover from. Finding the right balance required aligning batch size with processing latency and payload complexity.

‍

The queue layer also needed adjustment. Without constraints, it flooded downstream systems with work. By aligning visibility timeouts with actual processing durations and limiting consumption rates, we were able to smooth out execution and avoid bottlenecks.

‍

Compute scaling presented a similar challenge. Increasing concurrency alone didn’t improve performance. It amplified instability. More effective gains came from limiting concurrency, optimizing execution time, and tuning memory allocation.

The database quickly became a bottleneck under load. This was addressed by introducing connection pooling, optimizing queries, and shifting from per-record operations to batch-based writes.

‍

External APIs added another layer of variability. Rate limiting and exponential backoff ensured that these dependencies didn’t destabilize the pipeline when they slowed down or failed.

Handling failures without breaking the system

At this scale, failures are inevitable. The goal is to contain them rather than eliminate them entirely.

‍

Each processing unit was designed to operate independently, which meant that a failure in one record didn’t affect others. Errors were categorized to make diagnosis easier, and retries were applied selectively using exponential backoff to avoid overwhelming the system.

‍

Failed records were routed into a separate recovery path, allowing them to be analyzed and reprocessed without interrupting the main pipeline. This approach kept the system running even when individual components encountered issues.

‍

Observability and control

Visibility played a critical role in stabilizing the system. With centralized monitoring in place, it became possible to track progress across batches, identify where bottlenecks were forming, and understand failure patterns in real time.

‍

Instead of reacting to issues after they caused disruptions, the team could proactively adjust execution parameters based on what the system was doing. This made large-scale processing far more manageable and significantly reduced operational uncertainty.

The outcome

With controlled orchestration in place, the system moved from unstable to predictable. More importantly, the improvements were measurable.

‍

Processing throughput increased significantly. Before optimization, processing 1,000 records from an 800K-record dataset took between five and six minutes. After introducing controlled parallelism with two independent batch applications and tuning database connection pooling, the same workload completed in one to one and a half minutes. This resulted in a four to five times speed improvement on the same infrastructure.

‍

Failure and retry behavior also improved. By introducing structured error categorization, isolating processing units, and applying controlled exponential backoff, cascading failures were eliminated. Retry storms, which are a common issue in untuned parallel pipelines, were no longer a factor.

‍

The database, which had previously been a major bottleneck, was stabilized through a combination of connection pooling, query optimization, and a shift from per-record writes to batch-based operations. Without these changes, increased parallelism would have simply shifted the bottleneck rather than resolving it.

‍

Operationally, the system became far easier to manage. With centralized monitoring and file-level checkpointing in place, progress could be tracked in real time and processing could resume safely from any failure point without reprocessing completed data. This reduced the need for manual intervention and increased overall confidence in running the pipeline at scale.

Final takeaway

Parallel processing alone doesn’t solve scale problems. In many cases, it creates them.

What matters is how execution is controlled across systems. By introducing orchestration with qibb and focusing on controlled parallelism, it’s possible to build pipelines that scale without sacrificing stability.

About acheron software consultancy

Acheron Software Consultancy specializes in designing and implementing scalable data and media workflows. As a qibb implementation partner, they help organizations build reliable, production-ready systems for complex media processing and migration challenges.

‍

Get started with
qibb today

Automate, connect, and scale your media workflows faster.See what qibb can do for your team today.

Book a demo

Start free trial