Background The number of proposed prognostic models for COVID-19, which aim to predict disease outcomes, is growing rapidly. It is not known whether any are suitable for widespread clinical implementation. We addressed this question by independent and systematic evaluation of their performance among hospitalised COVID-19 cases. Methods We conducted an observational cohort study to assess candidate prognostic models, identified through a living systematic review. We included consecutive adults admitted to a secondary care hospital with PCR-confirmed or clinically diagnosed community-acquired COVID-19 (1st February to 30th April 2020). We reconstructed candidate models as per their original descriptions and evaluated performance for their original intended outcomes (clinical deterioration or mortality) and time horizons. We assessed discrimination using the area under the receiver operating characteristic curve (AUROC), and calibration using calibration plots, slopes and calibration-in-the-large. We calculated net benefit compared to the default strategies of treating all and no patients, and against the most discriminating predictor in univariable analyses, based on a limited subset of a priori candidates. Results We tested 22 candidate prognostic models among a cohort of 411 participants, of whom 180 (43.8%) and 115 (28.0%) met the endpoints of clinical deterioration and mortality, respectively. The highest AUROCs were achieved by the NEWS2 score for prediction of deterioration over 24 hours (0.78; 95% CI 0.73-0.83), and a novel model for prediction of deterioration <14 days from admission (0.78; 0.74-0.82). Calibration appeared generally poor for models that used probability outcomes. In univariable analyses, admission oxygen saturation on room air was the strongest predictor of in-hospital deterioration (AUROC 0.76; 0.71-0.81), while age was the strongest predictor of in-hospital mortality (AUROC 0.76; 0.71-0.81). No prognostic model demonstrated consistently higher net benefit than using the most discriminating univariable predictors to stratify treatment, across a range of threshold probabilities. Conclusions Oxygen saturation on room air and patient age are strong predictors of deterioration and mortality among hospitalised adults with COVID-19, respectively. None of the prognostic models evaluated offer incremental value for patient stratification to these univariable predictors.