Extra Q and As from Claus:
What other omics integration tools have you tested on your datasets (DIABLO, MINT, sMBPLS...)?
This particular data set is more than 10 years old. I think Le Cao had published some of her papers by then, but the mixOmics package didn’t exist yet and neither did DIABLO and MINT. In a way doing this seminar was one way for me get into these packages a bit more. At that time we used the ade4 and made4 packages, which I think are pre-runners of omicade4. I wasn’t aware of the sMBPLS package, but it looks very interesting so will check it out. I think there are slightly different cultures depending on which omics data you deal with. The metabolomics world seems very strong on variants of PLS, whereas the Gene Expression folk seem to love network analysis (which you can regard as multivariate analysis too, but is very different).
Is there any rule for a maximum number of variables that can be assessed according to the number of samples (when the number of variables is huge but the number of subjects do not exceed a few hundreds)?
No, I don’t think so. The number of variables will only ever grow in comparison to the number of samples and saying there is a maximum number would imply we (as statisticians) refuse to accept our responsibility to deal with that. Statistics has developed rapidly in this area in the last 20 years, and the best and most successful methods tend to make use of the high-dimensionality rather than seeing it as a curse.
A pdf copy of the webinar presentation is available in the attachment to this message for ECN members.