Chapter 2 Let’s get started
First, download the latest version from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("mixOmics")
Alternatively, you can install the latest development version of the package from Github.
install.packages("devtools") # then load library(devtools) install_github("mixOmicsTeam/mixOmics")
mixOmics package should directly import the following packages:
igraph, rgl, ellipse, corpcor, RColorBrewer, plyr, parallel, dplyr, tidyr, reshape2, methods, matrixStats, rARPACK, gridExtra. For apple mac users, if you are unable to install the imported library
rgl, you will need to install the XQuartz software first.
2.2 Load the package
Check that there is no error when loading the package, especially for the
rgl library (see above).
2.3 Upload data
The examples we give in this vignette use data that are already part of the package, which means that the objects are stored as list. However, you do not need to store your data as list, just data frames will do. To upload your own data, check first that your working directory is set, then read your data from a
.csv format, either by using File > Import Dataset in RStudio (although be mindful this option may pose problem with row and col name headers) or via one of these command lines:
# from csv file data <- read.csv("your_data.csv", row.names = 1, header = TRUE) # from txt file data <- read.table("your_data.txt", header = TRUE) # then, in the argument in the mixOmics functions, just fill with: # X = data
For more details about the arguments used to modify those functions, type
?read.table in the R console.
2.4 Quick start in
Each analysis should follow this workflow:
- Run the method
- Graphical representation of the samples
- Graphical representation of the variables
Then use your critical thinking and additional functions and visual tools to make sense of your data! (some of which are listed in 1.2.2) and will be described in the next Chapters.
For instance, for Principal Components Analysis, we first load the data:
data(nutrimouse) X <- nutrimouse$gene
Then use the following steps:
MyResult.pca <- pca(X) # 1 Run the method plotIndiv(MyResult.pca) # 2 Plot the samples
plotVar(MyResult.pca) # 3 Plot the variables
This is only a first quick-start, there will be many avenues you can take to deepen your exploratory and integrative analyses. The package proposes several methods to perform variable, or feature selection to identify the relevant information from rather large omics data sets. The sparse methods are listed in the Table in 1.2.2.
Following our example here, sparse PCA can be applied to select the top 5 variables contributing to each of the two components in PCA. The user specifies the number of variables to selected on each component, for example here 5 variables are selected on each of the first two components (
MyResult.spca <- spca(X, keepX=c(5,5)) # 1 Run the method plotIndiv(MyResult.spca) # 2 Plot the samples
plotVar(MyResult.spca) # 3 Plot the variables
You can see know that we have considerably reduced the number of genes in the
plotVar correlation circle plot.
Do not stop here! We are not done yet. You can enhance your analyses with the following:
Have a look at our manual and each of the functions and their examples, e.g.
Run the examples from the help file using the
Have a look at out website that features many tutorials and case studies,
Keep reading this vignette, this is just the beginning!