Big data y Medicina traslacional Algunos Ejemplos Juan Luis Fernández Martínez Dpto. Matemáticas Universidad de Oviedo
Robots Biomédicos y medicina personalizada VISIÓN: TO FILLTHE GAP Establecer puentes entre la investigación biomédica y la práctica clínica (y por tanto el tratamiento personalizado), intentando remediar la desconexión existente entre ambas. SECTORES/MERCADOS UPSTREAM: Empresas farmacéuticas y biomédicas. DOWNSTREAM: Hospitales y clínicas. A DONDE NOS DIRIGIMOS Creación de una EBT en Asturias que partiendo de tecnologías propias desarrollará proyectos/tecnología y prestará servicios en diferentes ramas de la medicina personalizada en un mercado de carácter global.
Rama BIOTEC La solución de enfermedades raras y el cáncer necesita de un análisis inteligente de datos y el descubrimiento de nuevos biomarcadores para diagnóstico precoz y creación de terapias personalizadas. Rama FARMA For every 5,000 to 10,000 potential drug candidates that enter the discovery research stage, only about 2.5 to 5% will make it through to the preclinical phase. Of that percentage, only 0.05 to 0.1% will enter the clinical trial testing phase (fuente: www.acrohealth.org) (Re)Diseño de medicamentos minimizando efectos adversos. Estratificación de los pacientes en función de su respuesta. Rama MED (CLINICA) Necesidad de MODELOS PREDICTIVOS CLÍNICOS ROBUSTOS. Necesidad de diseño y creación de sistemas de apoyo a la decisión, tratamiento personalizado, elección óptima de drogas en función de la genética (firmas predictivas), y minimización de sufrimientos y de efectos secundarios innecesarios.
Tecnología Propia Plataforma GENEPI (Ramas Biotec y Farma) Rediseño de fármacos. Optimización de tratamientos (terapia personalizada). Minimización de toxicidades. Plataforma COLUMBUS (Rama de Imagen médica/clínica) Análisis de incertidumbres médicas. Diseño de sistemas automáticos de ayuda a la decisión. Proyectos en fase de génesis/desarrollo Proyecto FINISTERRAE: enfermedades raras y neurodegenerativas Proyecto de estimación de riesgo quirúrgico (el cálculo de factores tipo NSQIP National Surgical Quality Improvement Program) para predecir el riesgo operatorio en hospitales) http://riskcalculator.facs.org/
The challenges 1. Big data-bases: data acquisition advances faster than knowledge discovery. 2. Noise and incomplete data sampling. 3. Abuse of statistical methods. 4. Scientific ghettos. The solution consists in building multidisciplinary teams that makes an advanced use of Machine Learning/Optimization/Statistics for: designing biomedical robots that are able to learn dynamically from genomic data and to predict with uncertainty
Uncertainty in prediction/classification UNCERTAINTY IS ALWAYS PRESENT IN DECISSION- MAKING APPROACHES http://www.blueprismtech.com/
How to design biomedical robots? Model reduction framework BIOMEDICAL ROBOT Optimization & Sampling techniques Machine Learning techniques Dynamic learning (no fixed rules) Prediction with uncertainty Consensus decision making and risk analysis
Biomedical robots Biomedical Robot 1 Prediction 1 Biomedical Robot 2 Prediction 2 TRAINING DATA... New Sample... Biomedical Robot n Prediction n Consensus/Ensemble Learning
Genética y diagnóstico precoz Diseño de micro-arreglos personalizados Nuestro ROBOT Datos genéticos Diagnóstico precoz Optimización de tratamientos Minimización de toxicidades DIAGNÓSTICO + INCERTIDUMBRE
Algunos casos de éxito Predicción de la fatiga crónica (en baseline) en enfermos de cáncer de próstata sometidos a radioterapia Análisis de biomarcadoresen Leucemia linfocítica crónica (arraysde expresión, mutaciones, regiones CNV). Modelos clínicos para predicción de la necesidad de tratamiento y desarrollo de enfermedades autoinmunes en LLC Modelos clínicos para predicción de la respuesta al tratamiento en Linfoma de Hodgkin. Análisis diferencial de expresión en cáncer de páncreas/pancreatitis. Síndrome nefrótico y virus respiratorios en niños. Firmas genéticas para predicción de respuestas a fármacos. Análisis de vías genéticas afectadas por la administración de fármacos. Estimación del grado histológico de cánceres de mama triplemente negativos.
Cánceres TNBC Imágenes LEYCA (Anatomía Patológicas) GH 2 GH 3 DISEÑO de un robot que asesore en la determinación del grado histológico Optimización del tratamiento
Análisis de Toxicidades
Gene Expression Histograms Relative Frequency (%) 0.15 0.1 0.05 0 0 5 10 15 20 25 30 Gene Expression (log 2 ) Relative Frequency (%) 0.1 0.08 0.06 0.04 0.02 Histograms after gene selection HF samples LF samples 0 5 10 15 20 25 30 Gene Expression (log 2 )
Fisher's ratio curve 2 Fisher's ratio FR=(µ 1 -µ 2 ) 2 /(σ 1 2 +σ2 2 ) 1.5 1 0.5 0 100 200 300 400 500 600 Gene index 2.5 2 1.5 1 0.5 LOOCV Predictive Accuracy (%) 94 92 90 88 86 84 82 LOOCV predictive accuracy 80 0 20 40 60 80100120140160180200220240260280300320340360 Gene index 0-4 -2 0 2 4 6 fold change fc=log 2 (µ 1 /µ 2 )
2000 A) PCA (14 most discriminatory genes) PCA 2 1000 0-1000 -2000 P1A 10A P6A P2A xrtp2a xrt10a xrt6a xrt22a xrt12a xrt18a P8Axrt24A xrt16a xrt11a xrt19a xrt27a P5A xrt15a xrt25a xrt8a xrt17a xrt20a xrt13a 11A 8A 6A -3000-4000 13A -6000-4000 -2000 0 2000 PCA 1 95% de exactitud predictiva en aprendizaje(loocv) 77% en validación a ciegas (las muestras mal clasificadas fueron consideradas por los expertos como behavioral outliers )
CLL Efecto de las principales mutaciones observadas en pacientes con leucemia linfática crónica (CLL Wheel).
MORALEJAS Cada problema de aprendizaje tiene su propio alfabeto Cuando se encuentran los alfabetos de un problema, la clasificación se convierte en linealmente separable y el clasificador es capaz de generalizar con alta precisión. El alfabeto es un problema de predicción genética tiene estrecha relación con la génesis/explicaión biomédica.
Proyecto Finisterrae Enfermedades raras y neurodegenerativas Propiciar el descubrimiento de medicamentos y la generación de hipótesis a partir del dato genético en un solo paso Pruebas de concepto: IBM-PM, SE, ALS, DM. Optimizar el diseño de medicamentos (aspectos farmacogenómicos y farmaco-cinéticos).
IBM/PM Allograft Rejection Influenza A Class I MHC Mediated Antigen Processing and Presentation Staphylococcus Aureus Infection Interferon Signaling Immune Response IFN Alpha/beta Signaling Pathway Phagosome Tuberculosis Cell Adhesion Molecules (CAMs) Epstein-Barr Virus Infection TNF Signaling Leishmaniasis Immune Response Role of DAP12 Receptors in NK Cells Rheumatoid Arthritis Immune Response Antigen Presentation By MHC Class II IL-2 Pathway Toll-like Receptor Signaling Pathway Actin Nucleation By ARP-WASP Complex Immune Response IFN Gamma Signaling Pathway Viral Carcinogenesis Immune Response NFAT in Immune Response Type II Interferon Signaling (IFNG) HTLV-I Infection ICos-ICosL Pathway in T-Helper Cell IL12-mediated Signaling Events Complement and Coagulation Cascades Serine Tyrosine Retinoic Acid Cysteine Actinomycin D Arginine Dexamethasone Oligonucleotide Polyinosinic-polycytidylic Acid Threonine Cycloheximide Cd 437 Cyclosporin A Vegf Oxygen
ALS Ceramide Pathway Wybutosine Biosynthesis Pilocytic Astrocytoma G-protein Signaling_Rap2B Regulation Pathway Lysosphingolipid and LPA Receptors Peroxisome Nucleotide-binding Domain, Leucine Rich Repeat Containing Receptor (NLR) Signaling Pathways S-1P Stimulated Signaling Mitotic Telophase/Cytokinesis Immune Response Function of MEF2 in T Lymphocytes Syndecan-1-mediated Signaling Events Ran Pathway Glioblastoma Multiforme Cytosolic Sensors of Pathogen-associated DNA Hypertrophy Model Insulin Receptor Recycling MAPK Signaling Pathway EBV LMP1 Signaling Cellular Roles of Anthrax Toxin Androgen Signaling S-adenosylmethionine 3-(3-hydroxyphenyl)propionic Acid (1s,2s,3r,6r)-4-(hydroxymethyl)-6-(octylamino)cyclohex-4-ene-1,2,3- triol 3-{6-[(8-hydroxy-quinoline-2-carbonyl)-amino]-2-thiophen-2-ylhexanoylamino}-4-oxo-butyri Acid 1-methyl-3-trifluoromethyl-1h-thieno[2,3-c]pyrazole-5-carboxylic Acid (2-mercapto-ethyl)-amide Alpha-cyano-4-hydroxycinnamate (1r,2r,3r,4s,5r)-4-(benzylamino)-5-(methylthio)cyclopentane-1,2,3-triol 1,5-dideoxy-1,5-imino-d-mannitol 9-deazaadenine 5-thio-a/b-d-mannopyranosylamine 5-fluoro-beta-l-gulosyl Fluoride Ghavamiol Heparin Disaccharide Iii-s Heparin Disaccharide I-s 2-deoxy-2-fluoro-alpha-d-mannosyl Fluoride M-coumaric Acid Z-yvad-fmk (s)-fty720-phosphate 3-[2-(2-benzyloxycarbonylamino-3-methyl-butyrylamino)- propionylamino]-4-oxo-pentanoic Acid Bym338
http://www.ebi.ac.uk/arrayexpress/experiments/e -MTAB-1049/. The purpose of this study was to search for molecular signatures characterizing clinical T1 and T2 tumors and to better identify patients of high risk of relapse and/or metastasis. In total 46 samples, 27 T1 and 19 T2 infiltrating ductal carcinomas, were included. GFR 1.4 1.2 1 0.8 0.6 0.4 Breast Cancer SAMPLES 10 20 30 40 0.5 1 1.5 2 2.5 3 3.5 PROBES x 10 4 0.2 0-1.5-1 -0.5 0 0.5 1 1.5 Fold Change 85% exactitud-5 GENES SETDB1 1.00 1.28 C1D 0.53 0.76 RGMA 1.12 1.22 TBC1D3 1.38 1.99 LIF 0.97 1.02
Knowledge discovery and hypothesis generation The SETDB1 histone methyltransferase is recurrently amplified in and accelerates melanoma, underscoring the role of chromatin factors in regulating tumorigenesis (http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3348545/) C1D: The protein encoded by this gene is a DNA binding and apoptosis-inducing protein and is localized in the nucleus. It is also a Rac3-interacting protein which acts as a corepressor for the thyroid hormone receptor. This protein is thought to regulate TRAX/Translin complex formation. (http://www.ncbi.nlm.nih.gov/gene/10438) RGMA the impact of RGM on cancer has not been well explored. However, given the pivotal role of RGMs in BMP signalling and neogenin actions, there has been some recent interest in exploring the role of RGMs in cancer. RGMA suppress cell proliferation, migration, invasion and also increases apoptosis in colorectal cancer cells. It was identified as a candidate for tumour suppression in Classical Hodgkin's lymphoma (chl) using array comparative genomic hybridization. The perturbed expression of RGMs are associated with breast cancer progression and poor prognosis (Repulsive Guidance Molecules (RGMs)) TBC1D3, a Hominoid-Specific Gene, Delays IRS-1 Degradation and Promotes Insulin Signaling by Modulating p70 S6 Kinase Activity. TBC1D3 is a hominoid specific gene previously identified as an oncogene in breast and prostate cancers. (http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3278430/) LIF promotes tumorigenesis and metastasis of breast cancer through the AKT-mTOR pathway (http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3996668/)
http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Global gene expression of 13 frozen samples, 6 from typical and 7 from atypical surgically resected primary lung carcinomas (54675 sondas) 100% de exactitud solo con 1 GEN: SLC7A14 Diseases associated withslc7a14 includeretinitis pigmentosa 68, andretinitis pigmentosa. GO annotations related to this gene includeamino acid transmembrane transporter activity Many different cancers have been associated with CAR. Small cell lung carcinoma, breast cancer, and gynecologic cancer are the most common cancers associated with CAR. However CAR has been found in other types of lung cancer, colon cancer, melanoma, skin squamous cancer, kidney cancer, pancreatic, lymphoma, basal cell tumor, and prostate cancer. Keltner et al developed an autoimmune theory in 1983 when he found antibodies against retinal photoreceptors in patient with lymphoma who developed acute vision loss and retinal degeneration. Autoimmunity occurs when tumor antigens triggers a immune response from the host which creates antibodies that cross reacts with a retinal protein. This leads to cell apoptosis/death and retinal degeneration. GFR 40 30 20 10 0 Lung Cancer -1 0 1 2 3 Fold Change
Summary Big medical data analysis (if it is correctly done) will bring the opportunity to improve tremendously the medical methods of prognosis, prediction and diagnostic. Biomedical robots can be used in different pathologies to Performing new knowledge discovery. Decision optimization and treatment. Finding new therapeutically targets. Economize current medical resources and to generate new economical resources. Methods are agnostic, that is, very different kind of data can be used to perform the decision-making approach.