Time and computing resource consumption optimization over a petabyte of data processed
May 31, 2019 from Veranika Tsiareshchanka
Founded in 1946, the Institut National de la Recherche Agronomique (INRA) is the leading agricultural research institute in Europe and the second largest in the world in terms of the number of projects carried out by its researchers and the number of scientific publications. Its teams work in research areas that range from food quality and agricultural sustainability to the preservation of the environment, biodiversity and ecosystems. To carry out its missions, INRA uses state-of-the-art technologies.
MetaGenoPolis brings together researchers, engineers, laboratory technicians, bioinformaticians, bio-analysts, statisticians, mathematicians, microbiologists and a doctor. Through the implementation of advanced metagenomic technologies, the mission of this INRA platform is to understand the impact of the intestinal microbiota - i. e. all microorganisms (bacteria, archaea, viruses, fungi) found in the intestine - on human and animal health.
MetaGenoPolis works on human stool samples to extract microbial DNA and sequence them. Each sample results in 20 million short sequences that must then be assembled like a puzzle to reconstruct genes and genomes and finally establish their microbial profile (i.e. the microbial species present and their abundances).
“Our database now includes the sequencing results of almost 20,000 samples”, explains Nicolas Pons, research engineer at INRA and head of the MetaGenoPolis bioinformatics platform. “This represents a total of 1 petabyte of data that we must store and process locally, i.e. 1 million billion bytes. That’s considerable!”.
“To build microbial profiles for each individual, we rely on catalogues of genes and microbial species representative of ecosystems,” he continues. “In the human intestine alone, there are nearly 10 million genes. “, explains Nicolas Pons.
According to the studies entrusted to MetaGenoPolis, the number of samples to be processed at the same time can reach and even exceed several hundred or even several thousand units.
MetaGenoPolis therefore needs a digital infrastructure and storage solutions that are particularly reliable and adapted to these complex operations. To do this, the platform relies on the ProActive solution developed by ActiveEon to orchestrate the IT processing of data from the analysis of microbiota samples. ProActive allows not only to distribute treatments in a time-optimized way, but also in terms of computing resource consumption.
“ProActive allow us to organize the different computing tasks on a cluster, our group of servers on the network, while providing us with a workflow engine that facilitates bio-analysts the implementation of certain processes by optimizing the workflow and accessibility to specific resources. It is completely adapted to our bio-informatics and bio-statistics processing needs.”
To learn more about INRA MetaGenoPolis research visit the following website:
Jul 2, 2019 from Veranika Tsiareshchanka
Job planner allows to set up custom and recurrent execution of selected workflows based on calendar rules. You can schedule recurring jobs, i.e. every hour, every 1st day of the month, every week day,...
Jun 12, 2019 from Veranika Tsiareshchanka
Microsoft Azure resources are hosted in data centers spread over geographical regions worldwide, allowing users to reach the standard 99.95% VM Service Level Agreement. Users can manage Azure resources using the Azure web portal, the Azure CLI or the dedicated Azure REST API....