@@ -66,22 +66,22 @@ These stresses have a significant influence on these communities and lead to an
For an efficient conservation of soils, it is essential to be able to detect the emergence and trends of these changes at an early stage.
For the past 15 years, the RMQS ("Réseau de Mesures de la Qualité des Sols" - French Soil Quality Monitoring Network) has been meeting these goals of long-term assessment and monitoring of soil quality in France \cite{imbert_rmqs_2021}.
The RMQS network is based on the monitoring of 2,240 sites representative of French soils and their land use, spread over the whole French territory along a systematic square grid of 16 km x 16 km cells.
The RMQS network is based on the monitoring of 2,240 sites representative of French soils and their land use, spread over the whole French territory along a systematic square grid of 16 km x 16 km cells\cite{imbert_rmqs_2021}.
Soil samples, measurements and observations are taken every 10 to 15 years at the center of each grid cell.
As it is, the RMQS is probably one of the most intensive and extensive sampling strategy at the national scale.
The first sampling campaign in mainland France took place from 2000 to 2009, while the second campaign is currently ongoing (2016 to 2027).
The first campaign focused on soil contamination assessment and made it possible to map key soil parameters (i.e. 28 variables such as pH, carbon organic content or texture) as well as several trace metal elements and persistent organic pollutants \cite{dequiedt_rmqs_2020}.
Moreover, and although France is only the 50th largest country in the world by its total area, France is the third-largest European country and exhibits the third-highest pedological diversity across the world, according to the WRB classification \cite{minasny_global_2010}.
France is also known as an extremely diversified territory in terms of land use, climate conditions and biodiversity \cite{karimi_biogeography_2020}.
Thanks to the use of various molecular tools, a substantial body of scientific knowledge on soil microbial communities has been already produced \cite{dequiedt_biogeographical_2011, terrat_mapping_2017, karimi_biogeography_2018, karimi_biogeography_2020}.
Moreover, several technical developments were conducted to standardize each method and the whole process \cite{terrat_molecular_2012, terrat_meta-barcoded_2015, terrat_reclustor_2020, djemiel_biocom-pipe_2020}.
We decided to perform a reorganization of all available microbiological RMQS datasets (i.e. molecular microbial biomass, fungal:bacterial density ratio, bacterial richness, bacterial taxonomical characterization, bacterial habitat definition and beta-diversity through OTU matrix) to improve the reusage of the datasets, with a focus on easing its linking with other soil measurements from the RMQS.
We therefore decided to perform a reorganization of all available microbiological RMQS datasets (i.e. molecular microbial biomass, fungal:bacterial density ratio, bacterial richness, bacterial taxonomical characterization, bacterial habitat definition and beta-diversity through OTU matrix) to improve the reusage of these datasets, with a focus on easing its linking with other soil measurements from the RMQS.
Indeed, and although France is only the 50th largest country in the world by its total area, France is the third-largest European country, exhibits the third-highest pedological diversity across the world, according to the WRB classification \cite{minasny_global_2010} and France is also known as an extremely diversified territory in terms of land use, climate conditions and biodiversity \cite{karimi_biogeography_2020}. Consequently, these datatsets provides an extensive and relevant playground for successfully improving our understanding on soil microbial communities and their importance in various soil functions.
These datasets will be further enriched through current and future analyses (e.g. fungi microbial datasets, second sampling survey of the RMQS) and we expect that the provided datasets will ease the improvement of our knowledge on soil microbial communities by the whole scientific community.
\section*{Methods}
% Explication des attendus de cette partie dans un data paper.
% The Methods should include detailed text describing any steps or procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). See the detailed section in our submission guidelines for advice on writing a transparent and reproducible methods section. Related methods should be grouped under corresponding subheadings where possible, and methods should be described in enough detail to allow other researchers to interpret and repeat, if required, the full study. Specific data outputs should be explicitly referenced via data citation (see Data Records and Citing Data, below).
% Authors should cite previous descriptions of the methods under use, but ideally the method descriptions should be complete enough for others to understand and reproduce the methods and processing steps without referring to associated publications. There is no limit to the length of the Methods section. Subheadings should not be numbered.
...
...
@@ -125,7 +125,8 @@ Crude DNA extracts were resolved by electrophoresis in a 1\% agarose gel, staine
Dilutions of calf thymus DNA (Bio-Rad, Hercules, CA, USA) were included in each gel and a standard curve of DNA concentration (500, 250, 125, 62.5 to 31.25 ng) was used to estimate the final DNA concentration in the crude extracts \cite{ranjard_sampling_2003}.
The ethidium bromide intensity was integrated with ImageQuant software (Molecular Dynamics, Evry, France).
The reliability of this method against soil impurities has been previously confirmed \cite{ranjard_sampling_2003}.
After final quantification, if the obtained value is lower than 2.5 µ.g$^{-1}$ DNA, the soil is ignored for further analyses, and was indicated into the generated file \todo{je suis pas convaincu de cette manière d'amener l'info, juste dire que c'est hors gamme ? Ces sols ont pas été gardé dans l'analyse de \cite{dequiedt_biogeographical_2011} ?}.
After final quantification, if the obtained value is lower than 2.5 µ.g$^{-1}$ DNA, the value was considered as out of range, and indicated into the generated file.
These specific soils were excluded for the published analysis of 2011, for example \cite{dequiedt_biogeographical_2011}.
\subsection*{DNA purification}
Crude DNA was then purified for PCR amplification.
...
...
@@ -179,8 +180,8 @@ Here, all high-quality reads were compared to SILVA (r132) database using the US
Taxonomic assessment results were given for all analyzed samples, at each taxonomic level (phylum, class, order, family and genus).
\subsubsection*{Details on OTU analysis with ReClustOR}
Based on the clustering of OTUs after ReClustOR refinement, a total of 188,030 OTUs were identified, leading to an increase of 39.27 \% of OTUs for each sample on average.\todo{c'est par rapport au valeur en indépendant ?}
The plateau rarefaction curves of the analyzed samples and the accumulation curve of the OTUs (Figure \ref{fig:mm-bioinfo}) indicated that the sequencing dataset represented efficiently the microbial communities of the 1,842 soil samples.
Based on the clustering of OTUs after ReClustOR refinement, a total of 188,030 OTUs were identified, leading to an increase of 39.27 \% of OTUs for each sample on average, compared to the first analysis realized without ReClustOR \cite{terrat_mapping_2017}.
Both cumulative curves of OTUs from the analyzed samples (Figure \ref{fig:mm-bioinfo}) indicated that the sequencing dataset represented efficiently the microbial communities of the 1,842 soil samples.
The detected richness ranged from 870 to 3,075 OTUs by soil sample, with 72,120 OTUs harboring only one read.
Less than 0.25 \% of the OTUs (562) occurred in more than 50 \% of the sampled sites.
This OTU database avoids the clustering of all reads each time new datasets are compared or added, and as a consequence vastly decreases the computational costs.
...
...
@@ -204,6 +205,7 @@ The 15 splits were determined by 5 environmental variables: soil pH, land use, C
\section*{Data Records}
% Même chose pour la partie Data Records.
% The Data Records section should be used to explain each data record associated with this work, including the repository where this information is stored, and to provide an overview of the data files and their formats. Each external data record should be cited numerically in the text of this section, for example \verb|\cite{Hao:gidmaps:2014}|, and included in the main reference list as described below. A data citation should also be placed in the subsection of the Methods containing the data-collection or analytical procedure(s) used to derive the corresponding record. Providing a direct link to the dataset may also be helpful to readers (\url{https://doi.org/10.6084/m9.figshare.853801}).
% Tables should be used to support the data records, and should clearly indicate the samples and subjects (study inputs), their provenance, and the experimental manipulations performed on each (please see 'Tables' below). They should also specify the data output resulting from each data-collection or analytical step, should these form part of the archived record.
...
...
@@ -276,7 +278,7 @@ Differences in amplification efficiency between all PCR plates were estimated by
Second, for each plate, the derivation was deducted from the Ct to obtain an adjusted Ct (Figure \ref{fig:tv-qpcr}).
Third, the slope and intercept of a master calibration curve were calculated by using the values (adjusted Ct and concentration) of all standards from all experiments.
Finally, the number of rDNA copies of each environmental sample was defined based on the corresponding adjusted Ct and the master calibration curve parameters.
Globally, the Ct is well-adjusted for 18S and a bit over-adjusted for 16S (Figure \ref{fig:tv-qpcr}).
Globally, the complete Ct corrections is well-adjusted for 18S but a bit over-adjusted for 16S (Figure \ref{fig:tv-qpcr}).
\subsection*{Sequencing reliability}
To ensure comparison of obtained datasets between sequencing platforms and libraries, internal controls were incorporated into each analysis libray.