Clinical trials using ctDNA

October 10, 2016, 11:04 pm

≫ Next: Batch effects in scRNA-seq: to E or not to E(RCC spike-in)

≪ Previous: Index mis-assignment to Illumina's PhiX control

DeciBio have a great interactive Tableau dashboard which you can use to browse and filter their analysis of 97 “laboratory biomarker analysis” ImmunOncolgy clinical trials; see: Diagnostic Biomarkers for Cancer Immunotherapy – Moving Beyond PD-L1. The raw data comes from ClinicalTrials.gov where you can specify a "ctDNA" search and get back 50 trials, 40 of which are open.

Two of these trails are happening in the UK. Investigators at The Royal Marsden are looking to measure the presence or absence of ctDNA post CRT in EMVI-positive rectal cancer. And Astra Zeneca are looking for ctDNA as a secondary outcome to obtain a preliminary assessment of safety and efficacy of AZD0156 and its activity in tumours by evaluation of the total amount of ctDNA.

You can also specify your own search terms and get back lists of trials from OpenTrials which went live very recently. The Marsden's ctDNA trials above is currently listed.

You can use the DeciBio dashboard on their site. In the example below I filtered for trials using ctDNA analysis and came up with 7 results:

Thanks to DecBio's Andrew Aijian for the analysis, dashboard and commentary. And to OpenTrials for making this kind of data open and accessible.

↧

Batch effects in scRNA-seq: to E or not to E(RCC spike-in)

October 14, 2016, 4:46 am

≫ Next: SIRVs: RNA-seq controls from @Lexogen

≪ Previous: Clinical trials using ctDNA

At the recent Wellcome Trust conference on Single Cell Genomics (Twitter #scgen16) there was a great talk (her slides are online) from Stephanie Hicks in the @irrizarry group (Department of Biostatistics and Computational Biology at Dana-Farber Cancer Institute). Stephanie was talking about the recent work she's been doing looking at batch effects in single-cell data, all of which you can read about in her paper is on the BioRxiv: On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. You can also read about this paper over at NExtGenSeek.

Adapted from Figure 1 in Hicks et al.

Almost without exception every new technology gets published with a slew of high-impact papers. And almost without exception those papers turn out to be heavily biased. This is not to say we should expect every wrinkle to be ironed out before initial publication - new technologies take a lot of effort and the faster they make it into the public domain the sooner the community can improve them and make them more robust. Often batch effect is the first problem identified: with arrays, with NGS, and now with single-cell RNA-seq.

Stephanie et al looked at 15 published single-cell RNA-seq papers and found that in the 8 studies investigating differences between group, and where they could assess confounding effect it ranged from 82.1% to 100% (see table 1 from the paper - 82,85,93,96,98,100 & 100%). All of these studies were designed such that the samples were confounded with processing batch. They report that the number of genes detected expressed explained a significant proportion of observed variability, but that this varied across experimental batches. This confounding of biological question with experimental batch effectively cripples the project;

"Batch effects lead to differences in detection rates, which lead to apparent differences between biological groups"

However the authors do point out that relatively simple experimental design choices can be used to remove the problem.

What does this mean for ERCC and other spike-ins : In her final slides, see "The Wild West", Stephanie clearly explains the problems we face with batch effects and in normalising single-cell RNA-seq experiments.

Batch effects can be a big problem in scRNA-Seq data (but not always).
Batch effects and methods to correct for batch effects have been around for many years (lots of places to start).
Bad news: Poor experimental design is a big liming factor…. also, more complicated because of sparsity (biology and technology), capture efficiency, etc
Good news: Increase awareness about good experimental design. New methods specific for scRNA-Seq are being developed

It is looking more and more possible to use RNA spike-in's in scRNA-seq experiments specifically as a tool to help in the normalisation of the data, and also as a way to reduce/remove batch effects. Stephanie does state that there are still challenges in doing this, and also points to the use of UMI counts to help fix the problem by reducing amplification bias, etc.

However not every protocol recommends spike-in's and there is certainly no clear preference in the community - although I think this is beginning to emerge. Read about how ERCC's & SIRVs are being used in single-cell RNA seq in the latest paper from Sarah Teichmann’s group at EBI/Sanger.

I'm putting effort into understanding spikes in a lot more detail and am sure we'll all be using them routinely in a few more months.

What does this mean for the choice of scRNA-seq platform: My briefest of surveys for the three platforms we're using or looking at in my lab are as follows. Fluidigm suggest using the ArrayControl RNA Spikes (Thermo Fisher Scientific AM1780). Drop-seq suggest using the ERCC spikes (although this is not mentioned in their online protocol). 10X Genomics don't say anything about spikes in their current protocols!

I generated the figure at the top of this post to show where these 3 scRNA-seq platforms fit into Stephanie's figure 1 from the paper. Both C1 and Drop-seq are completely confounded as only one sample is processed per batch. 10X Genomics allows up to 8 samples to be processed together so a replicated "AvsB" study could be completed with zero batch effect.

But in the future we're likely to need 12, 24 or even 96 sample systems that allow us to process a scRNA-seq experiment in one go. Whilst it may well be possible to design Fluidigm C1 chips that can process more samples, each with fewer cells, or for Drop-seq to emulate 10X Genomics, or even for 10X Genomics to move to a larger sample format chip; none of this will solve the problem of collecting large numbers of single-cell samples without introducing batch effects further upstream in the experiment.

The take home message is to spend time on experimental design, and to replicate your study - simple enough stuff! Biological replication will allow batches to be randomised during the experiment to scRNA-seq prep runs and across sequencing flowcells if necessary. This generally allows batch effects to be removed from the experiment, even if they are significant.

↧

SIRVs: RNA-seq controls from @Lexogen

October 17, 2016, 2:14 am

≫ Next: Controlling for bisulfite conversion efficiency with a 1% Lamda spike-in

≪ Previous: Batch effects in scRNA-seq: to E or not to E(RCC spike-in)

This article was commissioned by Lexogen GmbH.

My lab has been performing RNA-seq for many years, and is currently building new services around single-cell RNA-seq. Fluidigm’s C1, academic efforts such as Drop-seq and inDrop, and commercial platforms from 10X Genomics, Dolomite Bio, Wafergen, Illumina/BioRad, RainDance and others makes establishing the technology in your lab relatively simple. However the data being generated can be difficult to analyse and so we’ve been looking carefully at the controls we use, or should be using, for single-cell, and standard, RNA-seq experiments. The three platforms I’m considering are the Lexogen SIRVs (Spike-In RNA Variants), or SEQUINs, or ERCC 2.0 (External RNA Controls Consortium) controls. All are based on synthetically produced RNAs that aim to mimic complexities of the transcriptome: Lexogen’s SIRVs are the only controls that are currently available commercially; ERCC 2.0 is a developing standard (Lexogen is one of the groups contributing to the discussion), and SEQUINs for RNAand DNA were only recently published in Nature Methods.

You can win a free lane of HiSeq 2500 sequencing of your own RNA-seq libraries (with SIRVs of course) by applying for the Lexogen Research Award

Lexogen’s SIRVs are probably the most complex controls available on the market today as they are designed to assess alternative splicing, alternative transcription start and end sites, overlapping genes, and antisense transcription. They consist of seven artificial genes in-vitro transcribed as multiple (6-18) isoforms to generate a total of 69 transcripts. Each has a 5’triphosphate and a 30nt poly(A)-tail, enabling both mRNA-Seq and TotalRNA-seq methods. Transcripts vary from 191 to 2528nt long and have variable (30-50%) GC-content.

Want to know more: Lexogen are hosting a webinar to describe SIRVs in more detail on October 19th: Controlling RNA-seq experiments using spike-in RNA variants. They have also uploaded a manuscript to BioRxiv that describes the evaluation of SIRVs and provides links to the underlying RNA-Seq data. As a Bioinformatician you might want to download this data set and evaluate the SIRV reads yourself. Or read about how SIRVs are being used in single-cell RNA seq in the latest paper from Sarah Teichmann’s group at EBI/Sanger.

Before diving into a more in-depth description of the Lexogen SIRVs, and how we might be using them in our standard and/or single-cell RNA-seq studies, I thought I’d start with a bit of a historical overview of how RNA controls came about...and that means going back to the days when microarrays were the tool of choice and NGS had yet to be invented!

RNA quality control – MAQC: The use of controls is recommended in any experiment, and the lack of them is one of the oft cited reasons for the current reproducibility crises. Nearly everyone who’s worked on differential gene expression in the last fifteen years has heard of the MAQC (MicroArray Quality Control) study. Although four sources of RNA were evaluated Stratagene’s Universal Human Reference RNA and Ambion’s Human Brain RNA samples were chosen because of the number of genes expressed at a detectable level, and the size of the fold changes between the two samples. These two control samples were used to evaluate five microarray platforms, in an international project involving 137 participants from 51 organisations (see Nat Biotech 2006). Labs like mine adopted, and continue to use the MAQC controls in our differential gene expression pipelines, which today are almost all based on RNA-seq methods. We used them in my lab to show how detection sensitivity drops as RNA inputs are reduced to under 100ng (something I keep meaning to repeat with RNA-seq).

The move to RNA-seq has had a dramatic impact on our ability to perform complex experiments. We are no longer limited to asking questions about the differential expression of genes where we have sequence information available to make an array. RNA-seq allows us to analyse the whole transcriptome; to assess differential gene expression (oligo-dT enriched mRNA-seq is the most widely used method), as well as differential splicing, allele specific expression, polyA tail length, transcription initiation and termination, microRNA, lincRNA, etc, etc, etc (see my "wish list" for controls at the bottom of this post).

The MAQC controls we used are simply not up to the more complex job that RNA-seq presents. Both the ABRF and SEQC papers used MAQC samples, which are admixtures of multiple individuals (I discussed these limitations in a 2014 post), but both included the ERCC controls as well.

Newer, more carefully designed and manufactured controls are available that can better serve the needs to biologists; and this is where SIRVs come in.

The SIRV workflow: from sample to answer

RNA quality control – Lexogen and beyond: SIRVs are designed to represent much of, but not all of, the complexity of Eukaryotic transcriptomes e.g. differential gene expression, differential splicing, polyA tail length variation, GC content, etc. SIRVs are designed to be added to samples before RNA extraction, or starting the RNA-seq library prep. They should allow an objective assessment of the technical biases in library preparation, sequencing and analysis; and ultimately should improve our ability to make biological insights from comparison of experimental conditions. They are a huge leap forward from the MAQC controls, and a significant step ahead of the ERCC1.0 controls, which are restricted to single-exon transcripts.

How are SIRVs made: SIRVs were designed to be similar to Human gene structures with overlapping multi-exon genes that are transcribed in both sense and antisense, with alternative splicing and alternative transcription start and end sites. Genes are in-vitro transcribed from linearized plasmids to produce full-length transcripts which are subject to very careful quality control and quantitation. This includes spectrophotometric, molecular weight, and Agilent Bioanalyser analyses. After QC and QT SIRV transcripts are mixed at equimolar concentrations (E0), or at 8-fold (E1) or 128-fold (E2) variations.

Designing SIRVs: A comparison of SIRV1 and KLK5

How are SIRVs used: Spiking SIRVs into your samples requires some careful consideration of how you’ll use the data they provide in downstream assessment. Today the most important control in my lab is simply whether the library prep has worked, or more importantly where it did not work whether it was the lab or the sample that was the cause of the failure. Our use of MACQ controls on a plate of samples is great, but extending this to an internal control in every sample is going to be better. However I don’t want controls to dominate the experiment or they’ll add too much to the costs of library preparation and sequencing.

SIRVs themselves don’t need much data to generate useful results and around 1% of your sequencing reads should be sufficient for most labs. However determining how much SIRV mix to add to your samples before extraction, or your RNA before library prep can require some empirical testing as the amount of RNA in a sample or a cell differs so much. As a rule of thumb 95% of RNA is ribosomal RNA’s, and the other 5% is mRNA (and non-coding RNAs). For an experiment starting with 100ng of TotalRNA in an mRNA-seq workflow approximately 50pg would represent 1% of the 5ng of mRNA present.

SIRVs are available in three configurations E0, E1 & E2 that mix the in vitro transcribed RNAs at equimolar (mix E0), up to 8-fold (mix E1), or up to 128-fold (mix E2), variation in concentration. Importantly SIRVs are built in a modular format and should be compatible to other spike in controls like the ERCC. Additional modules should address transcript lengths, polyA tail length variation, etc.

Coinciding with the webinar on October 19th, Lexogen will release the “SIRVs suite” (see "How are SIRVs analysed" below) for analysis of spike-in data. This will also include an "Experiment Designer" tool to calculate recommended spike-in ratios based on known or expected input for the RNA content, mRNA ratio, and type and efficiency of the workflow.

SIRVs in bulk RNA-seq: Bulk RNA-seq experiments can use SIRVs as process controls in place of the MAQC Brain and UHRR samples allowing a full 96 samples to be run on each plate. Assuming the 100ng TotalRNA input then just 50pg of SIRVs are needed per sample, with 5ng added to the oligo-dT master-mix used in the enrichment step. The use of SIRV E0 is recommended for process QC, but E1 and E2 may be useful when evaluating new methods for accuracy and precision of differential transcript detection and quantitation.

SIRVs in scRNA-seq: Single-cell RNA-seq has quickly adopted spike-in controls with Hashimshony et al presenting their use of ERCC spikes in the CELSeq protocol. Both Wu et al 2013 and Truetlein et al 2014 used the ERCC mixes at a 1:40,000 dilution spiked into the cell lysis mix of the Fluidigm C1 protocol. And Svensson et al use the ERCC and SIRV spikein's to assess sensitivity and accuracy of various protocols across a standard analysis pipeline. This demonstrates the utility of using RNA control spike-ins, but also the requirement for careful dilution to avoid swamping single-cell RNA-seq experiments with control data, or not having enough to QC data before interpreting results. Assuming each single cell has around 20pg of TotalRNA then just 200fg of SIRVs are needed per sample, the amount of SIRV added, and exactly where to add it the protocol is highly dependent on the single-cell RNA-seq protocol being used.

How are SIRVs analysed: Lexogen will release the Galaxy-based “SIRVs suite” for uploading, evaluating and comparing spike-in data. This will allow SIRV users to compare results from their experiments to anonymised data, and should help determine if your own experiment is any good. Back in 2003/4 I developed rptDB: a tool to compare QC data between Affymetrix arrays. This had over 3500 samples submitted to it, and allowed a quick easy call on whether your data was "good" or "bad" - highly context dependant of courrse! As a user if I had received data from a core lab or service provider, or were downloading RNA-seq data for meta-analysis, then being able to select only data where SIRV, or other, controls had been used, and where results were shown to be high-quality, would most likely save me considerable time in cleaning up data before starting.

SIRVS are not designed to be used as a normalisation tool. Whilst spike-ins have been considered they are not really reliable enough for standard normalisation procedures. The development of novel normalisation algorithms appears to offer hope for the future (see Risso 2014), and approaches like this might be applicable to SIRVs. I suspect this will be an active area of algorithm development over the next couple of years because of the huge interest in single-cell RNA-seq.

The competition: alternative RNA-seq controls

Sequins:‘Sequins’ (sequencing spike-ins) were developed by the Garvan Institute and recently published in Nature Methods. Sequins are conceptually similar to SIRVs. They are a set of synthetic RNA isoforms that align to an artificial in silico chromosome, with no homology to known genomes. They represent full-length spliced mRNA isoforms, at a range of concentrations. They can be used to assess differential gene expression and alternative splicing pipelines. The authors state that sequins can by used for normalisation, and refer to the same Nature Biotech as I did above. In their Nature Methods paper they do show some very nice results from scaling normalisation using sequins and I hope these results will ultimately be achieveable with any well-designed spike-in series.

In the back-to-back Nature Methods publications the team at Garvan show how sequins can be used in RNA-seq and DNA-seq experiments to asses biases and determine the limits of detection, quantitation and analytical methods. Sequin genes are mixed in a two-fold serial dilution, with a minimum three genes per dilution, to span an ~106-fold range. The team also developed 24 Sequins to represent cancer fusion genes and used these to assess fusion gene detection and quantitation. They also reported that split reads significantly outperformed read-pairs in their correlation with Sequin concentration – this has a significant impact on the sequencing format as many groups today use paired-end reads where longer single-end reads may be more sensitive, and would also be around 40% cheaper.

ERCC 2.0: the original ERCC1.0 controls are a mix of 92 relatively simple single-exon transcripts of varying length and GC content. They are used in a mix at known concentrations spikedinto samples before library preparation. ERCC2.0 aims to update the spikes to better represent the complexity of the transcriptome, and to provide FFPE derived controls. Again they are are conceptually similar to SIRVs and Lexogen were one of 9 groups invited to present at the 2014 NIST ERCC2.0 workshop at Stanford University.

Conclusions: The use of controls in RNA-seq experiments is an absolute requirement if you want to get the best out of your experiments. Bulk RNA-seq can benefit from a relatively simple data QC of the controls before moving onto more complex differential gene expression and splicing analyses. And including spike-in controls may allow easier comparison of longitudinal data sets, or between labs. Single-cell RNA-seq has shown an absolute requirement to include spike-ins, although the very latest papers suggest that spiked-in transcripts may not truly mirror Human mRNAs in the protocols used, due to much shorter poly-A tails (30 vs 200+bp), and that they may underestimate detection sensitivity by up to ten-fold.

SIRVs, more recently SEQUINs, and soon ERCC2.0 controls can be further enhanced and manufacturers should not be consider their job complete! With protocols like Pacific Bioscience’s ISO-seq and the advent of Oxford Nanopores direct RNA-sequencing longer and longer transcripts could be assessed and this will need to be controlled. Phased sequencing, possibly from long RNA molecules on 10X Genomics, is likely to need controls where variants are phased. Additionally PacBio and Nanopore sequencing also offer the ability to detect and quantify RNA base modifications. All of this shows how far the controls we might use still have to go.

My RNA controls wish list:

differential gene expression normalisation
differential splicing
allele specific expression
transcript and polyA tail length variation
GC content
transcription initiation and termination
non poly-adenylated RNAs e.g. microRNA, lincRNA
pseudogene mapping
limits of detection
RNA variant detection at different MAF
High-quality and degraded FFPE RNA
Spike-in's with corresponding baits for in-solution capture
Spike-in RNA encapsulated in synthetic cells
Phased variants on long RNAs
RNA base modifications

Please let me know what you’d like to add by leaving a comment below.

↧

Controlling for bisulfite conversion efficiency with a 1% Lamda spike-in

October 21, 2016, 12:00 am

≫ Next: Does the world have too many HiSeq X Tens?

≪ Previous: SIRVs: RNA-seq controls from @Lexogen

The use of DNA methylation analysis by NGS has become a standard tool in many labs. In a project design discussion we had today somebody mentioned the use of a control for bisulfite conversion efficiency that I'd missed, as its such a simple one I thought I'd briefly mention it here. In their PLoS Genet 2013 paper, Shirane et al from Kyushu University spiked-in unmethylated lambda phage DNA (Promega) to control for, and check, the C/T conversion rate was greater than 99%.

The bisulfite conversion of cytosine bases to uracils, by deamination of unmethylated cytosine (as shown above) is the gold standard for methylation analysis.

Users identify the C/T transitions in a comparison of bisufite treated/untreated samples, or by comparing to a known reference. However bisulfite treatment is a harsh biochemical reaction, and can cause large losses in template DNA. As such controlling for and measuring conversion efficiency is important in making conclusions about the methylation data from NGS experiments. As a reminder - Bisulfite does not convert methylated or hydroxy-mehtylated cytosine allowing users to discriminate between non-methylcytosine (C) and methylcytosine (mC) or hydroxymethylated (hmC) cytosine.

We're likely to start using this control if it works well in the project we have just kicked off. In the paper they added 1ng of to 1000 oocytes before performing a PBAT analysis. We'll aim for 1% spike-in, but need to consider how much to add to each sample, and whether Lambda is the right spike-in as we're using an RRBS method or this project. To check the suitability I grabbed the Lambda sequence from Genbank and did an in silico Msp1 digest using WebCutter2.0. I found 330 cut sites - which should be plenty for checking efficiency.

Want to learn more about bisulfite conversion in general? Take a look at Zymo's website, it's an excellent resource.

↧

Does the world have too many HiSeq X Tens?

October 21, 2016, 7:03 am

≫ Next: Unintended consequences of NGS-base NIPT?

≪ Previous: Controlling for bisulfite conversion efficiency with a 1% Lamda spike-in

Illumina stock dropped 25% after a hammering by the stock market with their recent announcements that Q3 revenues would be 3.4% lower than expected at just $607 million. This makes Illumina a much more attractive acquisition (although I doubt this summers rumours of a Thermo bid had any substance), and also makes a lot of people ask the question "why?"

The reasons given for the shortfall were "a larger than anticipated year-over-year decline in high-throughput sequencing instruments" i.e. Illumina sold fewer sequencers than it expected to. It is difficult to turn these revenue figures and statements into the number of HiSeq 2500's, 4000's or X's that Illumina missed it's internal forecasts by, but according to Francis de Souza Illumina "closed one less X deal than anticipated" - although he did not say if this was an X5, X10 or X30! Perhaps more telling was that de Souza was quoted saying that "[Illumina was not counting on a continuing increase in new sequencer sales]"...so is the market full to bursting?

Before diving into my own analysis (you might also like to read GenomeWeb's coverage), I would like to put these numbers in perspective. A 3rd quarter revenue of $607 million is nearly $2.5 billion over the full year (versus $2.3B in 2016 and $2.1B in 2015 (numbers from Illumina data here). And revenues grew by 10% year on year. This does not seem like bad news from an academic users perspective!

Is there such a thing as too many sequencers: Illumina have talked about how they were surprised by the interest in X Ten, and have sold far more units than they initially forecast. The word on the street seems to be that only a few X Ten labs are working at capacity Broad, NYGC, Human Longevity. Illumina have said the reagent pull-though on X Ten has been about $650K/X/year, which is only half of the theoretical $1.2 million/X/year.

Sales of HiSeq 4000 appear not to have been as strong as the 2500 platform was on its launch. NextSeq seems to be popular with almost 1000 units out in the field, especially for NIPT use, but also in medium sized labs wanting their own sequencer. I suspect a fair number of MiniSeq's are rolling off the production line (although whether they offer good value for money is debatable).

But Illumina's main reasons for slightly lower than expected performance were clearly lower sales of instruments; and this was particularly so in Europe last quarter. Todd Campbell at The Motley Fool asked an important question about what's happening in Europe "Europe was [growing] slower than the rest of the world."but he also poseed the questions "Why? What's so special about Europe? What are the things that could be going into the reasoning behind Europe growing more slowly than the other parts of the world?"He went on to discuss competition (from Oxford Nanopore) as being a factor, but most telling was something he picked up on from the Illumina conference call when their results were announced at the end of Q2"Europe is slowing is because of sharing of devices"!

I'd wholly subscribe to the "glut of capacity and increased use of outsourcing"hypothesis. If the glut does not go away, and if labs continue to move to outsourcing then Illumina will sell decreasing numbers of instruments and service contracts, but consumables pull-through should be higher from each box. Ultimately I think this is a win for Illumina as more science will be published using their technology - and that's really what we all want.

I run a core lab, and I know lots of other people who do in Europe, Africa, and across the world. Sharing Illumina (and other) instruments in core labs has been a part of science for a very long time. It makes good sense scientifically and economically (I know I'm biased). And from where I sit I can see many Illumina sequencers gathering dust (metaphorically speaking), or being run at 25% or lower utilisation. People got the funding to buy these amazing devices; but not the money to staff the lab, to service them, and to fund the projects to run on them. Perhaps worse is the opportunity cost of the lost science; science that could not be done because someone spent money on a sequencer rather than sequences.

Maybe instrument sales have slowed down in Europe because we've got wise to this problem, maybe scientists in Europe have seen how great great core labs can be, and that shared devices with high utilisation is a good thing for science in general. But what happens, if I'm right, when the rest of the world realises it has too many sequencers but not as many results as they'd expected, and focuses on buying sequences rather than sequencers?

Will users continue to purchase consumables at an ever increasing rate: Illumina's business model has been described as "a simple razor and blade model: Illumina makes one-time sales of large machines at lower margins, then provides consumables needed for use in their operation on an ongoing basis."

A Tweet earlier in the year from Kinghorn Genomics is one of the few public figures I've seen for actual sequencing throughput on an X Ten. 1100 genomes in one month is astounding, but still 20-30% short of the 1500 per month figure in Illumina's specs. Very few owners openly discuss the numbers of samples going through their instruments, and Illumina are very cagey about reagent pull-through in individual labs. It seems pretty clear if X Ten labs simply can't pull in the required numbers of samples to match Illumina's specs. But Kinghorn Genomics is at the high end of reagent pull through at < 70% utilisation.

Illumina's consumables are a highly profitable business with gross profits around 70%, and these margins have been at that level for as long as I can remember. I don't want to skip over the fact that Illumina has also invested heavily in R&D, and is investing heavily in the clinical adoption of it's core technology in the clinical space via Helix and Grail. So some of that 70% margin is going somewhere that is likely to be useful to me in the future. But lllumina have cited weakness in the HiSeq franchise outside of the X – both instrument shipments and consumables. Regent pull-through on HiSeq was below their estimates of ~$350K per year. Again pointing to a glut of sequencers, rather than sequencing projects. So reagents are perhaps the most important thing for Illumina to focus on.

Total revenue increased by $93.9 million to $1,171.9 million in the first half of 2016; up by 9% over 2015.

Consumables revenue (63% of total) increased by $128.4 million to $740.1 million in the first half of 2016; up by 21% over 2015 "driven by growth in the sequencing instrument installed base".

Instrument revenue (20% of total) decreased by $58.5 million to $243.2 million in the first half of 2016; down by 19% over 2015 "primarily due to lower shipments of our high-throughput platforms".

Service and other revenue (15% of total) increased by $23.2 million to $179.2 million in the first half of 2016; up by 15% over 2015 "driven by revenue from genotyping services and extended instrument service contracts associated with a larger sequencing installed base".

What locks us into Illumina: Capital costs are very high in replacing an Illumina fleet, my own lab has around £2 million invested (2x HiSeq 4000, 1x HiSeq 2500, 1x NextSeq, 2x MiSeq) - we couldn't simply go out and buy machines from another vendor, even if there were one. The real tie in is the infrastructure we've built up around the use of Illumina sequencing. Users are unlikely to switch until there is a really good competitor out there...and Life Tech's SOLiD and Ion Torrent technologies just were not good enough.

Predicting the future: For the future I'm as confident as everyone else that NGS usage is going up, bigger projects, more samples, more sequencing, more data - that's a great scenario for Illumina. They might be a bit stuck with the next big leap in instrument yields, as this would need to jump significantly to make labs like mine purchase new boxes, and that could land them back in the same position as they were in 2011. If the economic case for a new machine can't be made then labs will find it hard to get funding for incremental changes. And if Illumina do make a big leap then many labs may prefer to share the infrastructure costs, and aim bring down experimental costs. Where do Illumina go in the research space next if they can't bring us cheaper sequencing?

Q: What will Illumina announce at J.P.Morgan? A new sequencer? The $500 genome? Nanopores?

The use of NGS in oncology might take ten years to become profitable given the pace at which healthcare systems can adopt new technologies. I know from my experiences of the NHS that a few labs can be leading lights, but the majority need to be dragged into accepting change of almost any kind. Oncology is tough, but is a huge, and highly profitable, market so the effort from Illumina is likely to be worth it. Illumina certainly think so; SeekingAlpha quoted Francis deSouza (Illumina CEO) as saying "We spent a decade selling instruments to researchers who are experts and understand genomics. Now we're seeing applications take off, which is a much bigger market for us." Whether the recent stock fall was partly because the markets see the realisation of this "bigger market" as being too much of a future gamble is unclear to me. Verinata, Grail and Helix are really exciting ventures, but how quickly can they add to Illumina's revenues and profits?

The rapid adoption of NGS in NIPT might shed some light on the future. Verinata is now contributing high single digit percentages to Illumina's revenues, and this could reach 10% as soon as 2018. I'd highly reccomend anyone who can get access to BBC Player to watch the "A world without Downs" documentary!

I thought I'd finish up with a look to the future; particularly to the other NGS technology that we might be using alongside Illumina routinely by 2020 - Oxford Nanopore. The technology, soon to be "a genome centre in a box", and possibly iPhone compatible, is starting to gain traction outside of the hardcore fanboys and fangirls like Nick Loman and Josh Quick. Right now it is certainly an unproven, in the commercial sense; closed-community, the MAP is not a full commercial launch; and niche tool. But R9 makes Nanopore sequencing easy, and the most recent updates from Clive Brown point to a future where we might use Nanopores alongside SBS. If the ONT tech is truly disruptive then there is a future that may be decidedly less longer orange!

I'd not want to forget to mention Pacific Bioscience now that Sequel appears to be getting some traction (over 100 instruments sold since the launch compared to 100-150 RSIIs). And the 50x drop in DNA required is going to make this a tool people with limited sample availability can now consider using.

But we should not forget that Illumina is a company that can deliver on innovation. Whilst Illumina did not invent SBS - Solexa, a small UK company, did; Illumina turned Solexa's $2.5 million revenues in 2006, into a $100 million business, in one year! Many readers will remember the release of the HiSeq, MiSeq, NextSeq, X Ten - all significant leaps for genomics; and I'm betting they've got some pretty cool tech up their sleeves yet.

Finally: Do you think there are too many sequencers out there? Should we focus on buying sequences rather than sequencers? If the majority of users answer yes to these questions then sequencer sales may well continue to decline in the short term. But reagent pull-through on each box should increase, and Illumina's focus for research sequencing might shift to "blades rather than razors", on driving utilisation of their instal base up.

↧

Unintended consequences of NGS-base NIPT?

November 9, 2016, 3:34 am

≫ Next: MinION: 500kb reads and counting

≪ Previous: Does the world have too many HiSeq X Tens?

The UK recently approved an NIPT test to screen high risk pregnancies for foetal trisomy 21, 13, or 18 after the current primary screening test, and in place of amniocentesis (following on from the results of the RAPID study). I am 100% in favour of this kind of testing and 100% in favour of individuals, or couples, making the choice of what to do with the results. But what are the consequences of this kind of testing and where do we go in a world where cfDNA foetal genomes are possible?

I decided to write this post after watching "A world Without Downs", a documentary on BBC2 that was presented by Sally Phillips (of Bridget Jones fame), mother to Olly who has Down's syndrome. She presented a program where the case for the test was made (just), but the programme was very clearly pro-Down's. Although not quite to the point of being anti-choice.

My own personal experience of Down's is limited, and I'd watched the documentary more out of excitement to see how NGS is being rolled out across the NHS; particularly because the same technology is being applied in Cancer and is likely to transform patient treatment. My view before watching was that this new NIPT test could only be a good thing. The program made me see that there are likely to be unintended consequences of this kind of testing, and that there may be darker sides to the use of the technology. It made me think more carefully about the issue, but in the end I'm still 100% in favour of the test.

Unintended consequences of cell-free DNA testing in have been reported previously, with the discovery of cancer in an expectant mum first reported in 2013. How we deal with these issues is a matter of ongoing debate. For Down's the program highlighted the negative way expectant mothers and fathers are given the news that they may have a Down's child; and that better information can only lead to a more informed choice - not difficult to agree with that. Unfortunately the program can't escape it's Herodotian title. This test won't lead to "A world Without Downs", but how people use the information might.

I'd highlighted the program on Twitter after watching it. And posted again after reading an article in the Guardian "Fears over new Down's syndrome test may have been exaggerated, warns expert" where Prof Sir David Spiegelhalter was quoted as saying that terminations would not go up - based on the current models being used. I did not disagree with his stats (I'd be crazy to do that), but models can be wrong, and that was the basis of my Tweet.

The main argument from Phillips in the program is that this test will result in more terminations, and that means fewer people being born with Down’s syndrome. She visited Iceland, which she stated has not had a Down's syndrome child born in the last 5 years. This is surprising as I'd expect a country like Iceland to have a testing regime with as many false-negatives as anyone else - a few Down's children should have been born...and data from the WHO seem to suggest this is indeed the case.

Ultimately even if 100% of parents did choose to abort after receiving test results, as long as they were well informed before making their decision, then we've done the right thing. Haven't we?

Trisomies 13, 18 and 21 are the only things tested for right now. But the underlying technology could ultimately use whole genome sequencing and find the full spectrum of genetic abnormalities: such as an increased risk of psoriasis, glaucoma, and Alzheimer's. If my mum had decided these were not traits she wanted her baby to have I'd not be writing this blog.

↧

MinION: 500kb reads and counting

November 17, 2016, 7:49 am

≫ Next: 10X Genomics updates

≪ Previous: Unintended consequences of NGS-base NIPT?

A couple of Tweets today point to the amazing lengths Oxford Nanopores MinION sequencer is capable of generating - over 400kb!

Dominik Handler Tweeted a plot showing read distribution from a run . In replies following the Tweet he describes the DNA handling as involving "no tricks, just very careful DNA isolation and no, really no pipetting (ok 2x pipetting required)".

and Martin Smith Tweeted an even longer read, almost 500kb in length...

Exactly how easily we'll all see similar read lengths is unclear, but it is going to be hugely dependant on the sample and probably having "green fingers" as well.

Here's Dominics gel...

↧

10X Genomics updates

December 9, 2016, 7:57 am

≫ Next: CoreGenomics has moved

≪ Previous: MinION: 500kb reads and counting

We had a seminar form 10X Genomics today to present some of the most recent updates on their systems and chemistry. The new chemistry for single-cell gene expression and the release of a specific single-cell controller show how much effort 10X have placed on single-cell analysis as a driver for the company. Phasing is looking very much the poor cousin right now, but still represents an important method to understand genome organisation, regulation and epigenetics.

Single cell 3'mRNA-seq V2: the most important update from my perspective was that 10X libraries can now be run on HiSeq 4000, rather than just 2500 and NextSeq. This means we can run these alongside our standard sequencing (albeit with a slightly weird run-type).

The new chemistry offers improved sensitivity to detect more genes per cell, improved sensitivity to detect more transcripts per cell, an updated Cell Ranger 1.2 analysis pipeline, and compatibility with all Illumina sequencers - sequencing is still paired-end but read 1 = 26bp for 10X barcode and UMI, Index 1 is the sample barcode, read 2 = the cDNA reading back to the polyA tail.

It is really important in all the single-cell systems to carefully prepare and count cells before starting. You MUST have a single-cell suspension and load 100-2000 cells per microlitre in a volume of 33.8ul. This means counting cells is going to be very important as the concentration loaded affects the number of cells ultimately sequenced, and also the doublet rate. Counting cells can be highly variable; 10X recommend using a haemocytometer or a Life Tech Countess. Adherent cells need to be trypsinsed and filtered using a Flowmi cell strainer or similar. Dead cells, and/or lysed cells, can confuse analysis by leaching RNA into the cell suspension - it may be possible to detect this by monitoring the level of background transcription across cell barcodes. The interpretation of QC plots provided by 10X is likely to be very important but there are not many examples of these plots out there yet so users need to talk to each other.

There is a reported doublet rate per 1000 cells of 0.8%, which keeps 10X at the low end of doublet rates on single-cell systems. However it is still not clear exactly what the impact is of this on the different types of experiment we're being asked to help with. I suspect we'll see more publications on the impact of doublet rate, and analysis tools to detect and fix theses problems.

The sequencing per cell is very much dependant on what your question is. 10X recommend 50,000 reads per cell, which should detect 1200 transcripts in BMCs, or 6000 in HEK293 cells. It is not completely clear how much additional depth will increase genes detected before you reach saturation, but it is not worth going much past 150,000 reads per cell.

1 million single-cells: 10X also presented a 3D tSNE plot of the recently released 1 million cell experiment. This was an analysis of E18 mouse cortex, hippocampus, and ventricular zone. The 1 million single-cells were processed as 136 libraries across 17 Chromium chips, and 4 HiSeq 4000 flowcells. This work was completed by one person in one week - it is amazing to think how quickly single-cell experiments have grown from 100s to 1000s of cells, and become so simple to do.

Additional sequencing underway to reach ~20,000 reads per cell. All raw and processed data will be released without restrictions.

The number of cells required to detect a population is still something that people are working on. The 1 million cell dataset is probably going to help the community by delivering a rich dataset that users can analyse and test new computational methods on.

What's next from 10X: A new assay coming in Spring 2017 is for Single Cell V(D)J sequencing, enabling high-definition immune cell profiling.

The seminar was well attended showing how much interest there is in single-cell methods. Questions during and after the seminar included the costs of running single-cell experiments, the use of spike-ins (e.g. ERCC, SIRV, Sequins), working with nuclei, etc.

In answering the question about working with nuclei 10X said "we tried and it is quite difficult"...the main difficulty was the lysis of single-nuclei in the gel droplets. Whilst we might not be able to get it at single-cell resolution, this difficulty in lysing the nucleus rather than the cell might possibly be a way to measure and compare nuclear versus cytoplasmic transcripts.

↧

CoreGenomics has moved

January 23, 2017, 12:58 pm

≪ Previous: 10X Genomics updates

"CoreGenomics is dead...long live CoreGenomics"...the CoreGenomics blog has moved to its new home: http://enseqlopedia.com/coregenomics. Please update your bookmarks and register to follow the new blog, for updates on the NGS map (coming soon), and to access the new Enseqlopedia (coming soon)!

Enseqlopedia: Last year I started the process of building the new Enseqlopedia site, after five years of blogging here on Blogger. Whilst Enseqlopedia is still being developed the CoreGenomics blog has moved over and you can also find all the old content there too. Comenting should be much easier for me to manage so please do give me your feedback directly on the site.

NGS mapped: Currently I'm working on the newest implementation of the Googlemap sequencer map Nick Loman and I put together many years ago. The screenshot of the demo gives you an idea of what's changed. The big differences are a search bar that allows you to select technology providers and/or instruments. The graphics also now give a pie-chart breakdown of the instruments in that location...you can clearly see the dominance of Illumina!

Other technologies that will appear soon will be single-cell systems from the likes of 10X Genomics, Fluidigm, Wafergen, BioRad/Illumina, Dolomite, etc, etc, etc. So users can find people nearby to discuss their experiences with (we're also restarting our beer & pizza nights as a single cell club here in Cambridge so keep an eye out for that on Twitter).

Lastly a change that should also happen in 2017 is the addition of users to the map. I'm hoping to give anyone who uses NGS technologies a way to list their lab, and highlight the techniques they are using. Again the aim is to make it easier for us to find each other and get talking.

Enseqlopedia.com is a big step for me. I hope you think it was worthwhile in a year or so. There's one one feature I've not mentioned until now which I'm hoping you'll get to hear more about in the very near future - the Enseqlopedia itself. Watch out for it to appear in press.

Thanks so much for following this blog. I'm sad to leave Blogger. I hope you'll come with me to Enseqlopedia/coregenomics.

↧