1 Abstract

During the past two decades, a large number of antibody (Ab) libraries have been constructed to meet the needs of drug discovery and diagnostic processes. The advent of next-generation sequencing (NGS) technology has enabled scientists to rigorously assess library size, quality, diversity and robustness at different stages of the construction process. However the currently available bioinformatic tools mainly focus on the analysis of the third complementarity-determining region (CDR3) of T/B cell receptors, while Ab library sequencing requires the generation of reads that span the entire variable domain of Ab chains. In addition, existing tools were not designed to efficiently share the analysis results with others and are limited to a single user setting. To address these challenges, an R package, abseqR, was developed to enable the visualization of analysis results of Ab library sequences via interactive and portable HTML reports. The core sequence analysis is carried out via a Python sister package, named abseqPy, which merges paired-end reads, annotates V-(D)-J germlines, calculates unique clonotypes, analyses primer specificity, facilitates the selection of best restriction enzymes, predicts frameshifts, and identifies functional clones. abseqR package provides high-level functions that generate visualization for alignment quality of germline genes, V-(D)-J germline distributions, V-J germline associations, stop codon hot-spots, rearrangement frames, diversity indices and diversity estimations. Most importantly, these analyses are seamlessly extrapolated to analyze multiple library repertoires simultaneously when multiple samples are present. This includes additional analyses such as inter-sample clonotype comparison and sample clustering based on clonotype frequency. Importantly, abseqR condenses these analytics into comprehensive standalone HTML reports to allow users to easily navigate and browse the qualities of their Ab libraries.

1.1 Software availability

1.2 Keywords

antibody libraries, repertoire sequencing, interactive reports, analytics sharing