跳到主要內容

Run Seurat (a R Package) in a notebook interface on a server without root

tl;dr: Dependencies hurts when you try to set up R on server

After a while of playing around, I’ll say the best way to use R with a Notebook-style interface on a server where you are no superuser would be using Anaconda, and then run R inside Anaconda to get whatever package you need. It is designed to run for a normal user, so there’s no need for superuser permission, and dependency issue is also taken care of most of the time.
If you get the permission on server and are comfortable with R Studio, R Studio Server might be a even better option. That is because even if the packages are in Conda R, Conda Forge, or Bioconda, sometimes they are out of date, which might break dependencies, and conda skeleton and conda build unfortunately come with a lot of dependency issue as well. Besides, the visualization of environment and integration of documentation is better in R Studio for me.
What worked for me starts from creating a new environment in Anaconda and install the r-essentials, which should provide a platform ready for jupyter notebook.
conda create --name R
conda install -c r r-essentials
One thing to be noted is that sometimes the packages on Conda are not up-to-date, and that would create some version issue for the R packages, so you might check on different channels to see if there’s something you need with version number in mind. It might be simpler to just use install.packages() in the R environment directly though.
During my package installation, one error message popped out: ImportError: /[Conda path]/env/libgfortran.so.4, and it seemed to be libgfortran not installed on my system. I did conda install libgfortran gfortran libgcc gcc, hoping to get the libraries ready, but it did not fix the error. It turned out I needed to export $LD_LIBRARY_PATH=[Conda lib path for the virtual environment], or R seemed not to know where to find the libraries installed by Conda. After that, I am finally good to go with Jupyter Notebook running R.
The takehome message for me is that it could be complicated to install required libraries to user directory, and the dependencies for these are often too long to be manually managed. I found out I needed cmake to install something, but when I tried to install cmake from source, it warned a compiler supporting C++11 was not found, and then I tried to get new version of gcc but the configure file of cmake still could not find that copy of gcc… I spent almost 3 days in this maze of dependencies, and though I felt I learned a lot, I was not able to fix things this way.
Anyway, I guess that why we need package management. You could find other unsuccessful attempts below if you are interersted.

Attempt 1: Setting up Rkernal with IPython Notebook in Anaconda

Anaconda keeps an independent copy of R when it is installed with conda install -c r r-essentials. I wanted to install the latest version instead of the conda version of Seurat, so I googled for how to install packages from CRAN in Conda and find this article. According to the thread, I tried to do something like install.packages("Seurat", lib = [conda R path]) from my user copy of R.
There is actually a great reason against this approach — install.packages() decides whether to install dependencies based on the installed ones, so if you run install.packages() from another copy of R, the dependency would be a mess.
I tried to use the Conda copy of R, and do install.packages("Seurat"), and found out…
ERROR: dependencies ‘igraph’, ‘diffusionMap’ are not available for package ‘Seurat’
* removing ‘[My Conda path]/R/library/Seurat’
It turned out igraph was not compiled successfully and seems to be a result of system library dependency issue according to this thread, and I would need libssl-dev, libcurl4-openssl-dev, and libssh2-1-dev, and I gave up here because I saw a labyrinth of dependencies here.

Attempt 2: Install Rstudio Server without root

If I could do it, I would be able to use the user copy of R, which I am okay with, and I like R Studio quite a lot. Long story short. Installation is possible, but execution is not. Basically you could follow the official instruction, and finish with make install prefix=[somewhere you can write]. Nonetheless, starting a server unsurprisingly requires you to be root.
One additional hilarious thing here is that we did not have cmake on that server, and when I tried to install a copy, it failed because no C++ compiler supporting C++11 is available (though which g++ showed a nice one.) Until I am done through all these, I noticed that there was already a copy of rstudio-server on the machine I used.

Attempt 3: Make Jupyter Notebook use my user copy of R

I entered R console, and tried devtools::install_github("IRkernel/IRkernel"). Guess what, it failed with an error message of: Installation failed: An unknown option was passed in to libcurl.
Alas. A bit troubleshooting indicated that I might have some version issue for libcurl, but I don’t know which.
Alternatively, I tried to use install.packages("JuniperKernel"), which should serve the same function as IRkernel, but it also has its dependency issue: it requires gdtools, which won’t compile without cairo. cairo requires libpng and pixman. I tried to install from source of libpng and pixman with ./configure && make && make install prefix=$HOME/.local/bin, butpixman kept complaining about unable to find png.h. I was not capable of fixing this and turned to other options, but I am still curious how could I fix this.

Attempt 4: Fine, I’ll just go with the Conda package

Now it seems that my hope to use Seurat on CRAN is not that wise, and my dataset is in fact compatible with Seurat v2.2.0 on bioconda. Perhaps this would be a much more efficient work-around.
Since it is in the bioconda repository, I need to set the channel first.
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
and then conda install -c bioconda r-seurat.
What I got was
UnsatisfiableError: The following specifications were found to be in conflict:
- r-reprex
- r-seurat
Checking dependency with conda info r-seurat showed that it requires ver 3.4.1 of r-base, while r-reprex need >3.4.3. This prevented me from down-versioning r-base.
Instead, I built Seurat from CRAN with conda skeleton cran Seurat. Interestingly, it saved the recipe in [pwd]/r-seurat instead of [pwd]/Seurat. With conda build r-seurat, guess what I got?
Error : object ‘map_dfr’ is not exported by 'namespace:purrr'
ERROR: lazy loading failed for package ‘Seurat’
subprocess.CalledProcessError: Command '['/bin/bash', '-e', '[pwd]/anaconda/conda-bld/r-seurat_1525165846607/work/conda_build.sh']' returned non-zero exit status 1.
It turned out Seurat required purrr::map_dfr(), which was introduced in newer version than what is on bioconda. A new version of purrr from Conda-forge fixed this, but then there was compilaton error mentioned earlier in this blog.

留言

這個網誌中的熱門文章

尿管水球考

Photo credit: Crystal Explosion via photopin (license)   那約莫是intern到一半的時候。記得那天我放尿管就要大功告成,隨手拿起換藥車上的空針想把固定用的水球打起來結束這回合的時候,碰巧路過的護理師一個飛身順手抄起空針: 「且住!你這針筒裡面裝的是……生理食鹽水吧!」   我定睛一看,啊呀,落在換藥車桌面上的空罐子果然是生理食鹽水,顯然是在抽的時候沒有專心。 「多謝女俠提醒,不過……那可以幫我抽一管純水嗎?我得扶著尿管,不太方便。」

文獻管理軟體:關於ReadCube, Mendeley, Papers有時還有其他

出發點 我家的文獻通常是這樣來的: 我有訂閱 Science 的 編輯精選 跟幾個與我題目有關的 關鍵字 實驗室平均來說每週會有一篇書報討論 臉書牆上看似有趣的玩意 它們會先成為瀏覽器上關不掉的分頁,過一段時間或是瀏覽器當掉幾次之後,心不甘情不願的搬家到下載項目,然後很多時候就長住在那兒,直到碰到比較無聊的演講開始整理時才會發現有些東西其實下載了五遍。 使用習慣 會在不同的裝置和系統上面閱讀 尋找一篇文的關鍵字通常是作者、期刊、跟內容的隨機組合 提到依稀記得的文章卻想不起來時會覺得很焦慮 從這幾點出發,對我來說特別重要的特質是 跨平台同步 、 全文檢索 (最好聰明點)、還有 執行速度快 。 除了這些之外,當然隨寫隨引的引用工具好不好用還有推薦文獻如何也有影響,不過就我來說為了配合協作,引用工具還是配合實驗室,反正也不是那麼頻繁的用上;至於推薦嘛,雖然廣泛閱讀是很重要啦,不過即使沒有推薦功能文獻資料夾裡也都充滿了不認識的孩子,我想推薦功能大半還是滿足屯書癖而已。 Endnote 老牌的文獻管理,作為和 Word 搭配的引用工具來說,除了厚重了點還有搭配追蹤修訂功能有點容易當機之外沒什麼大問題,最近的版本似乎也推出了跨平台同步的功能。其實它也能從 PDF 解析引用資料或是用引用資料尋找全文,不過 PDF 閱讀器相當陽春,我目前只有在寫東西時才會打開它,並不在上面閱讀。 ReadCube Nature Publishing Group 和 Wiley 推廣得相當認真的閱讀器,界面上也算漂亮。在文章管理上使用看起來像資料夾但實質上是標籤的處理方式。它雖然有跨平台同步功能,但只有付費用戶才能使用。 我一開始對於它的擴展 PDF 1 功能很感興趣,但使用起來其實還好,因為我的領域裡有不少論文本來就會用超連結放引用,但它的閱讀器本身卻不支援 PDF 原來就有的超連結,所以常常發生這篇論文沒辦法擴展,但原來的超連結又不能用,只好複製下來 Google 去也。擴展變成一個偶爾方便一些,但大多時候添麻煩的雞肋存在。 除了不支援超連結,它的 PDF 閱讀器在我的機器上 2 字體的渲染也有問題,比起 Acrobat自家的程式,字硬是模糊了不少,快速瀏覽的時候也常發生往後翻去的那頁空白了兩三秒才出現的狀態。另

ImageJ (1.51f) 在Mac OS 10.12 (Sierra)中會因為權限管理而無法使用Plugin

問題描述: 在將下載後的ImageJ資料夾搬到應用程式資料夾中後,程式可以使用但Plugin功能表下的項目消失。 系統資訊: OS Version: Mac OS 10.12 ImageJ: 1.51f  JAVA Version: 1.6.0_65 according to About ImageJ 1.8.0_111-b14 according to Control Panel Memory Assigned: 2854k of 7000MB (<1 li=""> No error message 原因: Mac OS 10.12為了解決使用者權限管理的漏洞,在執行應用程式時會建立一個隨機路徑的唯讀資料夾並把.app複製過去在其中執行(Gatekeeper Path Randomization)。這個作法會讓某些需要呼叫其他檔案的程式無法正常作用。   在ImageJ上,如果在Image>Show Info功能表(或Command + I)中的「ImageJ Home:」後面的路徑的開頭是"/private",那就可能是Gatekeeper Path Randomization在作怪。   將執行檔從應用程式資料夾中複製到桌面(Option+拖曳)後刪掉原檔再把執行檔複製回去可以修正這個權限問題。 參考資料: Sierra and Gatekeeper Path Randomization Kind and timely support from Wayne Rasband (NIH/NIMH)