Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Large-scale discovery, analysis, and design of protein energy landscapes

Title:	Large-scale discovery, analysis, and design of protein energy landscapes
Authors:	Ramos Ferrari, Allan Jhonathan; Dixit, Sugyan; Thibeault, Jane; Mario Garcia; Houliston, Scott; Ludwig, Robert; Notin, Pascal; Phoumyvong, Claire; Martell, Cydney; Jung, Michelle D.; Tsuboyama, Kotaro; Carter, Lauren; Arrowsmith, Cheryl; Guttman, Miklos; Rocklin, Gabriel
Publisher Information:	Zenodo
Publication Year:	2025
Collection:	Zenodo
Subject Terms:	Hydrogen Deuterium Exchange-Mass Spectrometry; Protein biophysics; Protein energy landscapes; Protein design
Description:	* IMPORTANT! Please Register to use of these data so that we can continue to release new useful datasets! This will take 10 seconds!! *This repository contains datasets generated for our study on protein energy landscapes using our multiplex hydrogen-deuterium exchange (mHDX) analysis. The datasets include raw and processed HDX data, NMR results, curated subsets, and machine learning splits with interpretable and deep learning-derived features. These resources support various analyses, including protein stability assessment, EX1 kinetics evaluation, and predictive modeling. Available Datasets: Dataset_0_InitialOrder: Initial DNA sequences from all libraries (15,715 unique sequences). Dataset_1_UnfilteredData: Minimally filtered HDX data based on confident identifications and PO score < 50 (8,293 unique sequences). Dataset_2_SuccessfulHDX: Proteins passing quality control metrics, including EX1 kinetics (5,778 unique sequences). Dataset_3_MeasurablyStable: Proteins reaching full deuteration with ΔGunfold > 2 kcal/mol and passing EX1 kinetics filter (3,590 unique sequences). Dataset_4_HDXNMR: HDX-NMR results per condition, including average ΔGopen per position (16 unique sequences). Dataset_5_MesophilicThermophilic: Subset of proteins from natural domains classified as mesophilic or thermophilic based on optimal growth temperature (>40°C) (1,637 unique sequences). Dataset_6_splits_interpretable: Machine learning splits with interpretable features (3,193 unique sequences). Dataset_6_splits_esm2: Machine learning splits with ESM2-derived features (3,465 unique sequences). Dataset_6_splits_unirep: Machine learning splits with Unirep-derived features (3,465 unique sequences). Dataset_6_splits_saprot: Machine learning splits with SaProt-derived features (3,465 unique sequences). Dataset_7_mHDX_cDNA: Subset of Dataset_2 (best PO scored candidate, EX1 kinetics excluded) overlapping with cDNA proteolysis assay data from Tsuboyama et al. (2023) (4,464 unique sequences). Dataset_8_PDFs: Comprehensive plots ...
Document Type:	dataset
Language:	unknown
Relation:	https://zenodo.org/records/14983481; oai:zenodo.org:14983481; https://doi.org/10.5281/zenodo.14983481
DOI:	10.5281/zenodo.14983481
Availability:	https://doi.org/10.5281/zenodo.14983481; https://zenodo.org/records/14983481
Rights:	Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Accession Number:	edsbas.1C8E0CE2
Database:	BASE