Learning Journey and Reflections on R/Medicine 2022 Conference Part 1
By Jeremy Selva in R/Medicine 2022
August 23, 2022
Introduction
Time flies since the last conference that I have attended. I am fortunate to be given a chance to attend the R/Medicine 2022 Conference and give a lightning talk as well.
The general goal of the R/Medicine Conference is to help practitioners in clinical and medicine sciences improve the quality of their research workflow with the use of R based tools. The focus is mainly on the application of R to create effective workflows and applications that promote reproducible data processing, reporting and visualisation.
Despite having to attend the conference in the middle of the night … again, it was indeed a fruitful experience. Here is a write up about my experiences in the conference for Day 1 and Day 2.
Day 1
I had attended two workshops: Clinical Reporting with gtsummary and Using Public Data and Maps for Powerful Data Visualization.
Clinical Reporting with π¦ {gtsummary
}
I actually never create a clinical summary table before but have heard from other colleagues about how difficult it is to do this in Excel and transfer the tabular results to Word. Things get pretty ugly when the table needs to be updated because of a new influx of study samples or/and the number of clinical information that needs to be reported is large. Having an R package like gtsummary that is able to create such reports automatically is beneficial to the medical community.
The course was led by Daniel Sjoberg, with Shannon Pileggi and Karissa Whiting assisting in answering most of the Q & A and consolidating them together. It turns out that gtsummary is an extension of the R package gt, designed to make publication ready tables. Other extensions of gt are gtExtras and a recent one for reporting adverse events called gtreg. gtreg will be presented by Shannon during the R/Medicine 2022 Conference as well.
The first part of the course was to use
gtsummary to create a table of statistical summaries using
tbl_summary
. It went on further on how to customize its output using functions like
add_p
,
add_overall
and
add_difference
. An overview of
tbl_summary
can be found in this
tutorial.
The second part of the course was about reporting regression results as a table using
tbl_regression
. It was amazing that the program was able to create a report table that shows (by a dash line) which group in a categorical variable is used as a control. Felt a bit bad for asking how to change the dash line to “Ref.”. It turned out the way to do this is not so trivial. See Q16 of the
Q & A for the answer. An overview of
tbl_regression
can be found in this
tutorial.
The last part was using
inline_text
to give a short summary of what the report table is trying to show. An overview of
inline_text
can be found in this
tutorial.
Due to a lack of time, the instructor quickly went through ways to combine different summary tables, summary table theming and print engines. Overall, it is a very informative session and I have learned a lot from it.
- π Course Material
- π Course Slides
- πΉ Video
Using Public Data and Maps for Powerful Data Visualization.
The course was conducted by Joy Payton, who is currently working in the Children’s Hospital of Philadelphia (CHOP) Research Institute. The session was about learning how to use R to access public data sets with the use of Application Programming Interface, or API in short, and to do geospatial analysis with the data received.
Despite having limited experience in doing such analysis, I was able to follow the course without much trouble. Joy had made a lot of effort on the course slides to make it comprehensible to an audience with limited computing experience.
The course started with a gentle introduction on the API and how the API could be used with R to extract relevant information from the public databases automatically, reducing the need to type many web addresses and click on several buttons repeatedly just to get access to the data. The exercises consist of ways to get information from the
-
PubMed database using the R package
easyPubMed
to communicate with the API -
Socrata database manually using Socrata Open Data API (SODA) to get some map data. Users may consider using
RSocrata
to obtain data as well.
The course then focused on ways to extract geospatial data from the web and plot them using the R package
leaflet
. Two type of map files were used during the course namely
GeoJSON and
Shapefile.
The last part of the course was a revision on what had been taught earlier. This time using the
US Census Data from the
Census Bureau as an example. The difference was that there was a need to obtain an API key by first signing up your email in this
webpage. Once the Census API key has been emailed to use, we can use the R package
tidycensus
to get relevant census and geospatial data to visualise the percentage of Latinos living in different parts of Philadelphia in 2010. This information was important for the decision makers to decide where to build a clinic that dedicated on Spanish language patient and meeting their health needs.
I had to admit that dealing with geospatial data could be daunting with a lot of jargon that I was not used to seeing. However, Joy’s exercise notes were very useful for a newbie like me. The notes helped me understand what these jargon in the data meant and what the R script was doing on the data. Getting the code to run and see the nice results at the end is indeed rewarding.
- π Course Material
- π Course Slides
- πExamples used
- πΉ Video
Day 2
For Day 2 of the conference, I have attended the following workshops: Differential Expression using R and Reproducible Research with Quarto.
Differential Expression using R
Siavash Ghaffari from Procogia provided an overview on the workflow of Bulk RNA Sequence Analysis. While most of the workflow was done by other bioinformatics tools, R was used to do differential expression analysis.
The input data consisted of
- A read counts table of RNA sequence taken from five patients’ tumour cells seeded into plates and treated with either DMSO (the control) or an LSD1 inhibitor for 24 hours.
- Additional information of the ten tumour cells.
The Bioconductor Package DESeq2 was used to do majority of the differential expression analysis such as,
- Comparing log fold changes between the control group and treatment as well as using
apeglm
for effect size shrinkage. - Creating plots like Dispersion Plot
DESeq2::plotDispEsts
, MA plotDESeq2::plotMA
. - Extracting a result table using
DESeq2::results
Next, the ENSEMBL IDs in the results data set was mapped to human gene names using AnnotationDbi::mapIds
before output them to csv files.
The workshop then proceeded to different ways to visualise the differential expression data. Prior to visualisation, the count data is first transformed using a variance stabalizing function DESeq2::varianceStabilizingTransformation
. With the transformed data, several plots were created for example,
- PCA plots using
DESeq2::plotPCA
- Hierarchical clustering and gene expression heat map using
ComplexHeatmap::Heatmap
- Volcano plots using
EnhancedVolcano::EnhancedVolcano
- Individual gene’s read counts comparison plots in relation to clinical attributes using
ggplot2::ggplot
It was a pity that we could not go through enrichment analysis which was what most participants were looking forward to. Nevertheless, it was a decent workshop for beginners like me. Hope to see the team conduct more of such workshops in the future.
- π Course Material
- πΉ Video
Reproducible Research with Quarto
The last workshop of the day was run by Tom Mock regarding Quarto.
To date, I only managed to create two Quarto documents. One is a
lab work from a R for Data Science
book club, the other is a
write up on how to create interactive plots using
plotly
and display them as trellis plot using
trelliscopejs
. Hopefully I could learn something new today.
When I realised that content of the workshop was going to be just slide presentations and demonstrations, I was initially worried that I would fall asleep because it was 3.30 am in where I was living and the content seemed quite technical (if you have little programming experience, it can be hard to understand things like Jupyter, Lua, Bootstrap, Javascript, etc…). It was a miracle that I was able to pull through till around 7am and ask a few questions along the way.
Special thanks π to the one who asked if Howard, the instructor’s pet dog seen in most of the slides, was given extra treats/food for contributing heavily to the course content. That really cracked me up internally though I could not laugh loudly because everyone else was asleep in where I lived.
At least it is comforting to know that I can still create
Github Flavoured Markdown documents using Quarto and a
hugo option is presence as well. This blog post comes from a Quarto .qmd
file
As for the issue of calling Quarto in the terminal of RStudio, for my computer, both Windows 10 Command Line and PowerShell uses quarto
.
On the other hand, Git Bash uses quarto.cmd
for my computer.
There is currently work in progress to make the version of the Quarto used during the R session available insession Info
in this
GitHub Issue.
Do give your support.
Another great question is the following
Great R-Medicine Quarto Workshop question about custom numbering of figure sub-captions.
Possible w/ YAML:
---
crossref:
labels: alpha a # (default is arabic)
subref-labels: roman i # (default is alpha a)
---
Docs: https://t.co/Rd3dA4IEj8#QuartoPub #RMed2022
— π¦ Tom Mock (@thomas_mock) August 25, 2022
I guess I will slowly look through the course material in my free time.