Volume 18, Issue 12 p. 4847-4861
Research article

Prediction of bacterial associations with plants using a supervised machine-learning approach

Pedro Manuel Martínez-García

Pedro Manuel Martínez-García

Área de Genética, Facultad de Ciencias, Instituto de Hortofruticultura Subtropical y Mediterránea ‘La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga, E-29071 Spain

Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223 Spain

Search for more papers by this author
Emilia López-Solanilla

Emilia López-Solanilla

Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223 Spain

Departamento de Biología Vegetal. Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Avenida Complutense, 3, Madrid, 28040 Spain

Search for more papers by this author
Cayo Ramos

Cayo Ramos

Área de Genética, Facultad de Ciencias, Instituto de Hortofruticultura Subtropical y Mediterránea ‘La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga, E-29071 Spain

Search for more papers by this author
Pablo Rodríguez-Palenzuela

Corresponding Author

Pablo Rodríguez-Palenzuela

Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223 Spain

Departamento de Biología Vegetal. Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Avenida Complutense, 3, Madrid, 28040 Spain

For correspondence. E-mail [email protected]; Tel. +34 913364546; Fax +34 91 3363985Search for more papers by this author
First published: 27 May 2016
Citations: 36

[Corrections added on 15 July 2016, after first online publication. The first paragraph under section Adhesion, plant cell wall degradation and detoxification as the most predictive features for identifying plantassociated bacteria have been updated.].

Summary

Recent scenarios of fresh produce contamination by human enteric pathogens have resulted in severe food-borne outbreaks, and a new paradigm has emerged stating that some human-associated bacteria can use plants as secondary hosts. As a consequence, there has been growing concern in the scientific community about these interactions that have not yet been elucidated. Since this is a relatively new area, there is a lack of strategies to address the problem of food-borne illnesses due to the ingestion of fruits and vegetables. In the present study, we performed specific genome annotations to train a supervised machine-learning model that allows for the identification of plant-associated bacteria with a precision of ∼93%. The application of our method to approximately 9500 genomes predicted several unknown interactions between well-known human pathogens and plants, and it also confirmed several cases for which evidence has been reported. We observed that factors involved in adhesion, the deconstruction of the plant cell wall and detoxifying activities were highlighted as the most predictive features. The application of our strategy to sequenced strains that are involved in food poisoning can be used as a primary screening tool to determine the possible causes of contaminations.