Portola's APEX Trial Design Was Too Clever By Half

On Thursday, March 24, Portola announced the top-line results from APEX, a large phase 3 trial of the novel oral anticoagulant betrixaban. With two other Factor Xa inhibitors already on the market (and no head-to-head comparison data) the findings of APEX are of more immediate interest to clinical trial designers than patients.

The drug itself may be a follower, but the design of APEX itself was cutting-edge. As the Portola press release proudly noted, APEX was the first trial of an anticoagulant to incorporate an enrichment design. The primary endpoint (the number of thrombotic events) would be compared with the standard-of-care arm first in a subset of patients considered at highest risk of thrombosis (those with elevated levels of the fibrinogen cleavage product called D-dimer). If betrixaban was superior in this sub-group consisting about half of all the randomized patients, the effect in a larger subgroup of intermediate risk patients would be tested, and finally if both these hurdles were passed, the effect in the whole population would be tested.

Such a design has the potential to increase the power for testing the primary hypothesis, and to avoid a common pitfall of late-stage trial design where a large effect in a particular sub-group is masked by a smaller effect in the remaining patients. Even if the population that responds to the drug is identified afterwards, the kind of post hoc data trawling required to identify it is rightly swatted away by the regulators. With the APEX design, if betrixaban was very effective in the high-risk group, but ineffective in the remainder, the positive result in the first cohort would have supported approval, albeit with a narrower label.

The impact of the trial design was therefore under the spotlight when the data was unblinded and the top-line results revealed.

Unfortunately, betrixaban failed to demonstrate superiority (by the smallest of statistical margins) in the first cohort, and so failed to trigger the cascade of tests in the larger populations.

Arguably, the pharma industry needs more innovation–but innovation in late-stage trial design has, it seems, cost Portola (and its investors) dear, as the stock fell 30% on the day the results were announced.

The problem lies in the second of the two hypotheses implicit in the trial design. With the enrichment design, APEX was not only testing the primary hypothesis that betrixaban was superior to standard-of-care, but in addition that the effect was materially larger in the high-risk population with elevated D-dimer.

If the drug was only equally effective in all patients (no bad thing for a drug), then the smaller size of the sub-group would reduce the statistical power of the trial to the point where that effect size was no longer significant. And that is exactly what happened in APEX–the relative risk of a thrombotic event in the betrixaban-treated patients was about 0.8 in all three groups, compared to standard-of-care. This effect size was highly significant in the whole population of more than 7,000 (p=0.006), and even significant in the middle-sized "cohort 2" (p=0.03). But in the smallest "cohort 1" of high-risk individuals, the same effect size just failed to reach significance (p=0.054).

The data may have fallen just the wrong side of the significance line, but the impact is likely to be significant. FDA have rightly been rigid on their interpretation of statistical analysis plans in late-stage trials of drugs for large indications. The plan called for a cascade of significance testing, and the drug failed at the first hurdle. Once you allow “narrow failures” to pass, the whole fabric of the drug-testing framework is called into doubt.

And in the case of betrixaban, where two other drugs with the same mechanism of action have already been approved, it is hard to justify making a special case on the grounds of patient need. It is difficult, therefore, to imagine a positive assessment from the key US regulator.

While the FDA may rightly reject betrixaban on the strength of this data, that masks a very obvious truth: Taking all the data together, it is clear betrixaban is superior to standard-of-care (putting aside any arguments as to whether the non-significant 20% increase in bleeding events cancels out the significant 20% reduction in thrombosis). Without the enrichment design, Portola would have scored a clear success with p=0.006 in the whole study population.

Even taking the most conservative statistical treatment of the data, performing a Bonferonni correction for three tests performed on the three populations, would have left the primary end-point statistically significant. Portola will no doubt make such arguments, but will nevertheless face an uphill battle with the regulator.

The blame for such an outcome lies squarely at the door of the enrichment trial design. And the whole episode is an important lesson for all of us designing trials: There is no free lunch in statistics. You can direct the power of your study where you want, but you cannot create power out of thin air. Worse still, as the design becomes more complex, the clarity of the hypotheses being tested becomes ever cloudier.

How sure were the trial designers that betrixaban would be more effective in the high-risk sub-group? Was that based on a dangerous post hoc analysis of the Phase 2 data? Did they realize just how serious would be the impact of that hypothesis being wrong when the primary hypothesis (that betrixaban was superior to standard-of-care) was robustly true?

The take-home message, as in so much of science, is that simpler is better. Only admit complexity if you are certain of the hypothesis implicit in the more complicated design. As Portola’s investors are painfully learning, adding in a second hypothesis only serves to introduce a second way to fail. Far from delivering two for the price of one (a chance to win in the limited population, and then a chance to double up in the whole population), the enrichment design has undermined the entire APEX study.

It is interesting to speculate whether a global pharmaceutical company would have taken the same gamble with such a trial. Certainly neither Pfizer and BMS, with apixaban, or Johnson & Johnson , with rivaroxaban, sought much innovation in the pivotal trials of the earlier Factor Xa inhibitors. But strategies such as adaptive designs and enrichment protocols are becoming more common, so past choices do not necessarily indicate current thinking in these companies.

But a theme is definitely emerging: twice in the last year DrugBaron has praised global pharmaceutical companies for their trial design–by comparing their psoriasis treatment Cosentyx directly with the recently approved Stelara (from JnJ), Novartis was able to super-power their commercial launch. And by clever selection of comparison arm, the pivotal trial of Entresto arguably also magnified the apparent superiority of the new drug. While this sleight of hand may have contributed to the slower-than-expected performance of Entresto, in both cases trial design maximized the apparent efficacy of the drug candidates being tested (and both yielded straightforward nods from the FDA).

By contrast, smaller biotechs are struggling to generate the data needed to support approval of their agents–and at least some of this difficulty is down to clinical trial design. Portola likely does have an approvable drug, but arguably just not the data package to prove that. The same may be true of Sarepta with their DMD drug eteplirsen.

For sure, trial design issues are not the exclusive preserve of small companies. Just recently, Lilly changed the primary endpoint of its vast Alzheimer’s disease trial that is currently recruiting, and they are guilty of changing the hypothesis they are testing with that anti-amyloid drug with each new failure. But investors beware–there is very probably an additional hidden risk in smaller companies performing late-stage clinical trials beyond the simple question of whether their drug product candidates are effective and safe.

More From Forbes

Portola's APEX Trial Design Was Too Clever By Half