Pack 1482.rar _BEST_
Gene co-expression network analysis is extremely useful in interpreting a complex biological process. The recent droplet-based single-cell technology is able to generate much larger gene expression data routinely with thousands of samples and tens of thousands of genes. To analyze such a large-scale gene-gene network, remarkable progress has been made in rigorous statistical inference of high-dimensional Gaussian graphical model (GGM). These approaches provide a formal confidence interval or a p-value rather than only a single point estimator for conditional dependence of a gene pair and are more desirable for identifying reliable gene networks. To promote their widespread use, we herein introduce an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Unlike the existing tools, SILGGM provides statistically efficient inference on both individual gene pair and whole-scale gene pairs. It has a novel and consistent false discovery rate (FDR) procedure in all four methodologies. Based on the user-friendly design, it provides outputs compatible with multiple platforms for interactive network visualization. Furthermore, comparisons in simulation illustrate that SILGGM can accelerate the existing MATLAB implementation to several orders of magnitudes and further improve the speed of the already very efficient R package FastGGM. Testing results from the simulated data confirm the validity of all the approaches in SILGGM even in a very large-scale setting with the number of variables or genes to a ten thousand level. We have also applied our package to a novel single-cell RNA-seq data set with pan T cells. The results show that the approaches in SILGGM significantly outperform the conventional ones in a biological sense. The package is freely available via CRAN at -project.org/package=SILGGM.
There are some existing software packages for gene co-expression network analysis. For example, the popular R package WGCNA  provides functions to construct a gene co-expression network based on the marginal correlations. In terms of the partial correlation-based approaches particularly for large-scale settings, glasso  and huge  are two widely adopted packages for fast estimation of gene-gene conditional dependence based on the high-dimensional GGM. More recent packages include FastCLIME , flare  and XMRF . Unlike the marginal correlation-based approaches and high-dimensional GGM estimation, there are in practice few efficient packages or algorithms for the aforementioned approaches of rigorous statistical inference with the partial correlations that are supposed to be more powerful in large-scale gene-gene network analysis. FastGGM  is the recently developed package for an efficient and tuning-free implementation of B_NW_SL and has made the method computationally feasible to tens of thousands of genes. However, some redundant steps in the algorithm can be further improved and the outputs in only a matrix format make the package less friendly to users. Except FastGGM, no efficient R package has been proposed for the other above related works, and the expensive computation of naïve implementation also remains a challenge for these approaches.
We focus on the high-dimensional settings with p (the number of genes) allowed to be far larger than n (the number of subjects). The SILGGM package has one main function SILGGM() with various arguments and its workflow is described in Fig 1.
The source code of the package and a complete reference manual including dependencies, usage of all package functions and associated examples are freely available via CRAN at -project.org/package=SILGGM. The details of package installation are described in S3 Appendix.
The package SILGGM is computationally efficient compared to the MATLAB implementation of GFC_L and the R package FastGGM. Since R is a publicly free platform and has been more widely used in biological research compared to MATLAB which is a piece of commercially licensed software and has less accessibility to biologists, the R platform-based SILGGM will play a more important role in accelerating the biological gene network studies. SILGGM is also statistically efficient with both individual and global inference due to the theoretical justification of the four approaches and the validation of estimation accuracy in simulation studies. The analytical results from the single-cell data with Pan T cells further reflect the statistical efficiency of SILGGM since inferred gene networks are more reliable. Moreover, the comprehensiveness of SILGGM allows users to have more flexible choices of methods depending on the specific purpose of their study. Due to its computational feasibility, analytical reliability in results and methodological comprehensiveness, SILGGM can become a valuable and powerful tool to a wide range of biological researchers for high-dimensional or even whole genome-wide co-expression network analysis.
In the future, we will add parallel computing to SILGGM so as to allow users to use multiple clusters for bigger data analysis since the droplet-based single-cell technology will further increase the sample size . In addition, the new feature for the rigorous statistical inference of high-dimensional multiple gene networks is another potential extension of our package because differential gene network analysis among different cell types or cells of multiple individuals is being paid more attention to.
Compressed package formats are often preferred because they are easier to manage, transfer and store. For the same reasons, only one or a few artifacts per module are commonly used. However, artifacts can be of any file type and any number of them can be declared in a single module.
In the Java world, common artifacts are Java archives or JAR files. In many cases, each revision of a module publishes only one artifact (like jakarta-log4j-1.2.6.tar.gz, for instance), but some of them publish many artifacts depending on the use of the module (like apache-ant binary and source distributions in zip, gz and bz2 package formats, for instance).
The artifact type is a category of a particular kind of artifact specimen. It is a classification based on the intended purpose of an artifact or why it is provided, not a category of packaging format or how the artifact is delivered.
In some cases the artifact type already implies its file name extension, but not always. More generic types may include several different file formats, e.g. documentation can contain tarballs, zip packages or any common document formats.
Most of the artifacts found in a repository are jars. They can be downloaded and used as is. But some other kind of artifacts required some unpacking after being downloaded and before being used. Such artifacts can be zipped folders and packed jars. Ivy supports that kind of artifact with packaging.
A packaged artifact needs to be declared as such in the module descriptor via the attribute packaging. The value of that attribute defined which kind of unpacking algorithm must be used. Here are the list of currently supported algorithms:
A file mymodule-1.2.3.jar.pack.gz would be download into the cache, and also uncompressed in the cache to mymodule-1.2.3.jar. Then any post resolve task which supports it, like the cachepath, will use the uncompressed file instead of the original compressed file.
It is possible to chain packing algorithm. The attribute packaging of a artifact expects a comma separated list of packing types, in packing order. For instance, an artifact mymodule-1.2.3.jar.pack.gz can have the packaging jar,pack200, so it would be uncompressed as a folder mymodule-1.2.3.
Even more problematic is the possible updates of the repository. We know that versions published in such repositories should be stable and not be updated, but we also frequently see that a module descriptor is buggy, or an artifact corrupted. We even see sometimes a new version published with the same name as the preceding one because the previous one was simply badly packaged. This can occur even to the best; it occurred to us with Ivy 1.2 :-) But then we decided to publish the new version with a different name, 1.2a. But if the repository manager allows such updates, this means that what worked before can break. It can thus break your build reproducibility.
If you already build your application and its modules using Ivy, it is really easy to leverage your Ivy repository to download your application and all its dependencies on the local filesystem, ready to be executed. If you also put your settings files as artifacts in your repository (maybe packaged as a zip), the whole installation process can rely on Ivy, easing the automatic installation of any version of your application available in your repository!
The packaging instructions are contained in "packager.xml" in a simple XML format. At resolve time this file gets converted into a "build.xml" file via XSLT and then executed using Ant. Therefore, Ant must be available as an executable on the platform. The Ant task executes in a separate Ant project and so is not affected by properties, etc. that may be set in any existing Ant environment in which Ivy is running. However, Ivy will define a few properties for convenience; see the "Properties" listed below.
Setting a resourceURL will cause the resolver to override the URLs for resources specified by the packaging instructions. Instead, all resources will be downloaded from an URL constructed by first resolving the resourceURL pattern into a base URL, and then resolving the resource filename relative to that base URL. In other words, the resourceURL pattern specifies the URL "directory", so it should always end in a forward slash.
Defines a packager resolver which points to the online repository. Builds will occur in a subdirectory of$user.home/.ivy2/packager/build, downloaded resources will be cached in $user.home/.ivy2/packager/cache and the mirror site organisation/[module]/ will be tried first for all resources. 041b061a72