Bristol Myers Squibb is a global biopharmaceutical company whose mission is to discover, develop and deliver innovative medicines that help patients prevail over serious diseases. In late 2024/early 2025, BMS used the DBOS Transact durable execution library to build a genomic data processing pipeline app called S3Mirror, that ran 40x faster than it did using AWS DataSync, and with durable execution and observability.
For clinical trials and translational research, biological samples are routinely submitted for genome sequencing. This sequencing is often performed by outside companies who then share the results back. The sequencing provider uploads a batch of raw sequencing reads, in the form of gzipped FASTQ files, to an S3 bucket hosted and shared by the genome sequencer. The files must then be transferred by BMS to their own S3 bucket in order to process the raw sequencing data into analysis-ready data and for archiving in the event the data needs to be re-interrogated at a later date. Since these results may be used for portfolio decisions or submission to the FDA or EMEA, the end-to-end process must be traceable and reproducible.
Read the paper authored by BMS: S3Mirror: Making Genomic Data Transfers Fast, Reliable, and Observable with DBOS
Data volume
The datasets are large. Currently, it is common to generate between 10GB and 100GB of data per sample while sequencing hundreds or even thousands of samples per batch. There may be several batches per week. To transfer large volumes of data, It may require over a day of wall-clock time to finish one batch. Thus, the desired solution should leverage many parallel requests to transfer data as quickly as possible.
Transfer errors
Any given S3 API call can fail with an intermittent error that is resolved on retry; failures can happen with software or infrastructure crashes. Other errors require human attention, such as when files do not have appropriate S3 read permissions set. Thus the transfer worked for some files but failed for others, requiring time-consuming work to find all the files affected. Rerunning the process from the start after failure is not desired because it needlessly repeats expensive and time-consuming work. The desired solution should automatically retry to resolve intermittent errors, fail gracefully and with notification on errors that need human intervention, and, if interrupted, have the ability to resume a transfer without repeating completed files.
Poor observability
If a transfer takes hours to complete, it is not reasonable to expect a human to monitor its status in real time. And, when transferring large numbers of files using automated, parallelized scripts, it may be difficult for a human observer to notice errors. Furthermore, given a large number of batch deliveries, identifying process failures and the corresponding set of logs for forensic examination can be tedious. If the transfer is somehow prematurely terminated as discussed above, the log of events could even be lost altogether. The desired tool should durably store the file-wise log of all successes and failures and make it observable during and long after the transfer.
BMS implemented S3Mirror as a DBOS Transact Python application on top of AWS Boto3 Python SDK.
At the core of the S3Mirror architecture is a DBOS durable queue - distributed, backed by Postgres and exposed through a lightweight application layer interface. For each file, S3Mirror puts a DBOS step on the transfer queue, keeping a list of Workflow handles to all the enqueued steps. A Boto3 s3.copy call is executed, configured to leverage parallelism, and DBOS is configured to retry up to 3 times with exponential backoff on error.
DBOS automatically persists the state of the job in Postgres. If the job is interrupted, the state information in Postgres is used to resume execution exactly where it left off prior to the interruption. The state information also serves as an audit trail (observability). One can query the data (or use the DBOS Pro console) to view the status of the job and its file transfers.
View the S3Mirror code on GitHub
Besides durability and observability, DBOS also enhanced S3Mirror with greater performance and cost efficiency on AWS. BMS ran a performance benchmark on 12TB of genomic data in 448 files. They ran the S3Mirror app on DBOS Cloud, which automatically scales to more VMs in response to queue growth.
Bonus: 41x faster file transfers and lower AWS costs
S3Mirror running on DBOS cloud took just 8.1 minutes to complete the transfers versus 5.6 hours required to transfer the files using AWS DataSync. This also reduced AWS costs from $183 to $0.10.
"The durable DBOS Queue abstraction is the centerpiece of our architecture, allowing us to meetthe three challenges simultaneously: letting VM workers execute tasks in parallel, durably tracking tasks that need to be completed and making this information observable."
Discover why brands are turning to DBOS for reliable and observable programs.
Add a few annotations to your program to make it resilient to any failure.