Time and computing resource consumption optimization over a petabyte of data processed
May 31, 2019 from Activeeon
Founded in 1946, the Institut National de la Recherche Agronomique (INRA) is the leading agricultural research institute in Europe and the second largest in the world in terms of the number of projects carried out by its researchers and the number of scientific publications. Its teams work in research areas that range from food quality and agricultural sustainability to the preservation of the environment, biodiversity and ecosystems. To carry out its missions, INRA uses state-of-the-art technologies.
MetaGenoPolis brings together researchers, engineers, laboratory technicians, bioinformaticians, bio-analysts, statisticians, mathematicians, microbiologists and a doctor. Through the implementation of advanced metagenomic technologies, the mission of this INRA platform is to understand the impact of the intestinal microbiota - i. e. all microorganisms (bacteria, archaea, viruses, fungi) found in the intestine - on human and animal health.
MetaGenoPolis works on human stool samples to extract microbial DNA and sequence them. Each sample results in 20 million short sequences that must then be assembled like a puzzle to reconstruct genes and genomes and finally establish their microbial profile (i.e. the microbial species present and their abundances).
“Our database now includes the sequencing results of almost 20,000 samples”, explains Nicolas Pons, research engineer at INRA and head of the MetaGenoPolis bioinformatics platform. “This represents a total of 1 petabyte of data that we must store and process locally, i.e. 1 million billion bytes. That’s considerable!”.
“To build microbial profiles for each individual, we rely on catalogues of genes and microbial species representative of ecosystems,” he continues. “In the human intestine alone, there are nearly 10 million genes. “, explains Nicolas Pons.
According to the studies entrusted to MetaGenoPolis, the number of samples to be processed at the same time can reach and even exceed several hundred or even several thousand units.
MetaGenoPolis therefore needs a digital infrastructure and storage solutions that are particularly reliable and adapted to these complex operations. To do this, the platform relies on the ProActive solution developed by ActiveEon to orchestrate the IT processing of data from the analysis of microbiota samples. ProActive allows not only to distribute treatments in a time-optimized way, but also in terms of computing resource consumption.
“ProActive allow us to organize the different computing tasks on a cluster, our group of servers on the network, while providing us with a workflow engine that facilitates bio-analysts the implementation of certain processes by optimizing the workflow and accessibility to specific resources. It is completely adapted to our bio-informatics and bio-statistics processing needs.”
To learn more about INRA MetaGenoPolis research visit the following website:
Oct 23, 2020 from ML Team
Deep learning algorithms are a series of (deep) neural networks that learn to recognise patterns from data. However, finding high-performance neural networks architectures for a certain type of application can demand many years of research...
Sep 3, 2020 from Caroline Pacheco
Let’s suppose that you have a large infrastructure containing several machines that have different operating systems (e.g. Microsoft Windows, Linux, MacOS) and distinct hardware configurations...
Mar 13, 2020 from Activeeon
Users can interact with third-party systems in two ways: using ProActive web portals (Studio, Automation Dashboard, Scheduler, Resource Manager) or using APIs (REST, Java, CLI, etc.)...