Warum R?

Transparenz und Reproduzierbarkeit der Analysen (Open Science mit R)

Transparenz und Reproduzierbarkeit sind zwei zentrale Aspekte der offenen Wissenschaftspraxis (Open Science). In diesem Sinne soll der gesamte Forschungsprozess (Planung, Durchführung, Analyse, usw.) transparent dokumentiert und diese Dokumentation öffentlich zugänglich gemacht werden. Dadurch werden alle Forschungsprozesse nachvollziehbar sowie reproduzierbar. Dies wird z.B. von der UNESCO empfohlen: UNESCO Recommendation on Open Science.

Der analytische Teil des Forschungsprozesses (z.B. Datenaufbereitung und statistische Datenanalyse) soll ebenfalls transparent dokumentiert und zugänglich gemacht werden. UNESCO Recommendation on Open Science:

“Open research data that include, among others, digital and analogue data, both raw and processed, and the accompanying metadata, as well as numerical scores, textual records, images and sounds, protocols, analysis code and workflows that can be openly used, reused, retained and redistributed by anyone, subject to acknowledgement. Open research data are available in a timely and user-friendly, human- and machine-readable and actionable format, in accordance with principles of good data governance and stewardship, notably the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, supported by regular curation and maintenance.”

Die Programmiersprache R eignet sich hervorragend für eine transparente und reproduzierbare Datenanalyse im Sinne der offenen Wissenschaftspraxis. Der R-Code dokumentiert alle Analyseschritte – von der Datenaufbereitung bis zur statistischen Auswertung. Hier ein Beispiel zur Berechnung des Mittelwerts (inkl. Boxplot-Visualisierung):

library(tidyverse) # Umfassende Paketsammlung laden (Werkzeugkasten) https://tidyverse.tidyverse.org/
library(patchwork) # Weiteres Paket laden (für die Darstellung mehrerer Plots) https://patchwork.data-imaginist.com/

### Daten filtern
### Ausreißer ausschließen (Reaktionszeit ≥ 5000)

DATA_OHNE_AUSREISSER <-
  DATA %>%
  filter(REAKTIONSZEIT < 5000) # Kinder mit Reaktionszeit < 5000 beibehalten

### Boxplot mit Mittelwert

DATA_OHNE_AUSREISSER %>%
  ggplot(aes(x = REAKTIONSZEIT, y = "")) +
  geom_boxplot(fill = "#78C2AD") +
  stat_summary(fun = mean, geom = "point", color = "red") +
  stat_summary(fun = mean, geom = "text", aes(label = paste0("M = ", round(..x..))), color = "red", vjust = -7) +
  xlim(c(0, 6000)) +
  ylab(NULL) +
  labs(caption = "in Millisekunden") +
  theme_minimal(base_size = 12)

In diesem Beispiel ist dokumentiert, dass die Berechnung des Mittelwertes ohne spezifische Ausreißer erfolgte. Anhand des offen zugänglichen R-Codes ist der Analyseprozess nachvollziehbar und kann somit kritisch überprüft und sogar wiederholt (also reproduziert) werden, sofern auch der dazugehörige Datensatz offen zugänglich ist. Die Analysen können dann sogar abgewandelt und erweitert werden, sofern kritische Wissenschaftler:innen z.B. den Ausschluss von Ausreißern hinterfragen und folglich die Ergebnisse mit und ohne Berücksichtigung der Ausreißer abgleichen möchten:

### Daten filtern
### Ausreißer ausschließen (Reaktionszeit ≥ 5000)

DATA_OHNE_AUSREISSER <-
  DATA %>%
  filter(REAKTIONSZEIT < 5000) # Kinder mit Reaktionszeit < 5000 beibehalten

### Analyse ohne Ausreißer (Reaktionszeit < 5000)

BOXPLOT_1 <-
  DATA_OHNE_AUSREISSER %>%
  ggplot(aes(x = REAKTIONSZEIT, y = "")) +
  geom_boxplot(fill = "#78C2AD") +
  stat_summary(fun = mean, geom = "point", color = "red") +
  stat_summary(fun = mean, geom = "text", aes(label = paste0("M = ", round(..x..))), color = "red", vjust = -5) +
  xlim(c(0, 6000)) +
  ylab(NULL) +
  #labs(caption = "in Millisekunden") +
  ggtitle("Analyse ohne Ausreißer (Reaktionszeit < 5000)") +
  theme_minimal(base_size = 12)

### Analyse mit Ausreißern (Reaktionszeit ≥ 5000)

BOXPLOT_2 <-
  DATA %>%
  ggplot(aes(x = REAKTIONSZEIT, y = "")) +
  geom_boxplot(fill = "#78C2AD") +
  stat_summary(fun = mean, geom = "point", color = "red") +
  stat_summary(fun = mean, geom = "text", aes(label = paste0("M = ", round(..x..))), color = "red", vjust = -5) +
  xlim(c(0, 6000)) +
  ylab(NULL) +
  labs(caption = "in Millisekunden") +
  ggtitle("Analyse mit Ausreißern (Reaktionszeit ≥ 5000)") +
  theme_minimal(base_size = 12)

BOXPLOT_1 / BOXPLOT_2 + plot_layout(axes = "collect") # Beide Boxplots untereinander anzeigen

Gelebte offenen Wissenschaftspraxis (Open Science) bedeutet nun, dass der R-Code und die dazugehörigen Daten frei zugänglich gemacht werden. Hierfür können entsprechende Open-Science-Portale genutzt werden, wie z.B. OSF, Figshare, Zenodo oder GitHub. Selbstverständlich ist hierbei darauf zu achten, dass die veröffentlichten Daten keine personenbezogene Daten im Sinne der DSGVO beinhalten. Nachfolgend sehen wir einige Beispiele aus der Forschungspraxis mit Veröffentlichung der Daten und des R-Codes. In den entsprechenden Papern erfolgt stets ein Verweis auf die Hinterlegung der Daten und des R-Codes innerhalb eines Open-Science-Portals.

Beispiel 1

Kashikar, L., Soemers, L., Lüke, T., & Grosche, M. (2024). Influence of the ‘Learning Disability’ label on teachers’ performance expectations—a matter of attitudes towards inclusion? In Journal of Research in Special Educational Needs (Vol. 24, Issue 3, pp. 696–712).

https://doi.org/10.1111/1471-3802.12664 (Paper)
https://osf.io/k8ejv/ (Übersicht)
https://osf.io/k8ejv/files/osfstorage (Daten und R-Code)

Beispiel 2

Young, E. S., Frankenhuis, W. E., DelPriore, D. J., & Ellis, B. J. (2022). Hidden talents in context: Cognitive performance with abstract versus ecological stimuli among adversity‐exposed youth. In Child Development (Vol. 93, Issue 5, pp. 1493–1510).

https://doi.org/10.1111/cdev.13766 (Paper)
https://github.com/ethan-young/hidden-talents-multiverse (Daten und R-Code)

Beispiel 3

Heyard, R., Ott, M., Salanti, G., & Egger, M. (2022). Rethinking the Funding Line at the Swiss National Science Foundation: Bayesian Ranking and Lottery. In Statistics and Public Policy (Vol. 9, Issue 1, pp. 110–121).

https://doi.org/10.1080/2330443X.2022.2086190 (Paper)
https://doi.org/10.5281/zenodo.4531159 (Daten)
https://snsf-data.github.io/ERpaper-online-supplement/ (R-Code als Website)
https://github.com/snsf-data/ERforResearch (R-Code bei GitHub)

Beispiel 4

Kulawiak, P. R., Poltz, N., Bosch, J., & Dreesmann, M. (2025). Understanding teachers’ perspectives on students with epilepsy in Germany: A survey examining knowledge, experience, and affective, cognitive, and behavioral attitudes to inform teacher training. In Epilepsy & Behavior (Vol. 163, p. 110157).

https://doi.org/10.1016/j.yebeh.2024.110157 (Paper)
https://doi.org/10.5281/zenodo.14210541 (Daten und R-Code)
https://pawelkulawiak.github.io/supplepi/ (Daten und R-Code als Website)
https://github.com/PawelKulawiak/supplepi (Daten und R-Code bei GitHub)

Ästhetische, adaptive und flexible Output-Formate (SPSS vs. R)

Mit R können die Ergebnisse (Output der Analysen), seien es Tabellen oder Grafiken, sehr effektiv angepasst und gestaltet werden, wodurch eine sehr ästhetische Berichterstattung in unterschiedlichen Formaten möglich ist (Word, PDF, PowerPoint, HTML, uvm.). Unterschiedliche Output-Formate lassen sich mit dem Quarto-Modul erzeugen, welches in RStudio integriert ist.

SPSS-Output-Formate sind hingegen oft unübersichtlich und umständlich anzupassen. Hier erfolgt beispielhaft die Darstellung einer SPSS-Korrelationstabelle:

Datenquelle für Korrelationstabelle

Díez-Palomar J, García-Carrión R, Hargreaves L, Vieites M (2020) Transforming students’ attitudes towards learning through the use of successful educational actions. PLoS ONE 15(10): e0240292.

https://doi.org/10.1371/journal.pone.0240292 (Paper)
https://doi.org/10.1371/journal.pone.0240292.s002 (SPSS-Datensatz pone.0240292.s002.sav)

Der SPSS-Output muss nach Word oder Excel kopiert und dort anschließend weiterverarbeitet werden, sofern z.B. die Darstellung der Korrelationstabelle im APA-Format erwünscht ist. Dieses (umständliche) Vorgehen sehen wir im nachfolgendem Video.

In R können wir hingegen sehr effizient eine adäquate Korrelationstabelle (in Anlehnung an das APA-Format) sowie eine Visualisierung der Korrelationstabelle erzeugen:

library(haven) # Paket für SPSS-Datenimport (SAV) https://haven.tidyverse.org/
library(sjPlot) # Paket für Korrelationstabelle https://strengejacke.github.io/sjPlot/

DATA <-
  read_sav("pone.0240292.s002.sav") # SPSS-Datenimport 

DATA %>% 
  select(Q1, Q2, Q3, Q4, Q5) %>% # Variablen auswählen
  tab_corr(na.deletion = "pairwise", triangle = "lower") # Korrelationstabelle (in Anlehnung an das APA-Format)

	1) We learn best when the teacher tells us what to do	2) We can learn more when we can express our own ideas	3) Learning through discussion in class is confusing	4) Sometimes, learning in school is boring
1) We learn best when the teacher tells us what to do
2) We can learn more when we can express our own ideas	-0.032
3) Learning through discussion in class is confusing	0.082	0.022
4) Sometimes, learning in school is boring	-0.140**	-0.065	0.090
5) Learning in school is better when we have other adults to work with us	0.126*	0.140**	0.122*	0.016
Computed correlation used pearson-method with pairwise-deletion.

library(correlation) # Pakete für visuelle Korrelationstabelle https://easystats.github.io/correlation/
library(see) # Pakete für visuelle Korrelationstabelle https://easystats.github.io/see/
library(gt) # Paket für schöne Tabellen https://gt.rstudio.com/

### Visuelle Korrelationstabelle

COR_PLOT <-
  DATA %>% 
  select(Q1, Q2, Q3, Q4, Q5) %>% # Variablen auswählen
  correlation(p_adjust = "none") %>%
  summary(redundant = TRUE) %>% 
  plot() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0, limits = c(-1, 1)) +
  ggtitle(NULL)

### Tabelle mit Hinweisen (Items)

TAB <-
  rbind(
  paste0("Q", attr(DATA$Q1, "label")),
  paste0("Q", attr(DATA$Q2, "label")),
  paste0("Q", attr(DATA$Q3, "label")),
  paste0("Q", attr(DATA$Q4, "label")),
  paste0("Q", attr(DATA$Q5, "label"))
  ) %>%
  tibble() %>%
  rename(Items = ".") %>% 
  gt() %>%
  cols_align(align = "left") %>%
  tab_options(table.align = "left")

COR_PLOT / TAB + plot_layout(heights = c(8, 4)) # Plot und Tabelle untereinander anzeigen

SPSS-Syntax vs. R-Code

Zu guter Letzt möchte ich SPSS-Syntax und R-Code beispielhaft gegenüberstellen, um zu demonstrieren, dass R-Code relativ elegant, prägnant und gut leserbar ist.

SPSS-Syntax

Quelle: https://www.ibm.com/docs/en/spss-statistics/29.0.0?topic=mgeg-scatterplot-border-boxplots-gpl

BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: salary = col(source(s), name("salary"))
  DATA: salbegin = col(source(s), name("salbegin"))
  GRAPH: begin(origin(5%, 10%), scale(85%, 85%))
  GUIDE: axis(dim(1), label("Beginning Salary"))
  GUIDE: axis(dim(2), label("Current Salary"))
  ELEMENT: point(position(salbegin*salary))
  GRAPH: end()
  GRAPH: begin(origin(5%, 0%), scale(85%, 10%))
  COORD: rect(dim(1))
  GUIDE: axis(dim(1), ticks(null()))
  ELEMENT: schema(position(bin.quantile.letter(salbegin)), size(size."80%"))
  GRAPH: end()
  GRAPH: begin(origin(90%, 10%), scale(10%, 85%))
  COORD: transpose(rect(dim(1)))
  GUIDE: axis(dim(1), ticks(null()))
  ELEMENT: schema(position(bin.quantile.letter(salary)), size(size."80%"))
  GRAPH: end()
END GPL.

R-Code

library(ggExtra) # Paket für marginal boxplots

PLOT <- # scatter plot (ohne marginal boxplots)
  DATA %>% 
  ggplot(aes(x = Beginning_Salary, y = Current_Salary)) +
  geom_point() +
  labs(x = "Beginning Salary", y = "Current Salary") +
  theme_minimal()

PLOT %>%
  ggMarginal(type = "boxplot",
             margins = "both",
             size = 5,
             fill = "skyblue") # add marginal boxplots

Interaktive Output-Formate (R-Shiny-Apps)

Mit R können die Ergebnisse sehr effektiv angepasst und gestaltet werden. Mit dem Quarto-Modul, welches in RStudio integriert ist, können wir zudem unterschiedliche Formate erzeugen (Word, PDF, PowerPoint, HTML, uvm.). Außerdem können sogar interaktive Output-Formate erstellt werden, also interaktive Ergebnisdarstellungen mit interaktiver Benutzeroberfläche (sogenannte R-Shiny-Apps). Eine R-Shiny-App kann lokal auf dem Rechner gestartet oder als Online-Anwendung genutzt werden. Mit der nachfolgenden R-Shiny-App kann man z.B. die Wortschatz-Entwicklung für spezifische Kinder anzeigen lassen.

R-Shiny-App mit fiktiven Daten (Wortschatz-Entwicklung)

https://kulawiak.shinyapps.io/demoapp/

Vielfalt der Möglichkeiten (Pakete)

Mit R kann man (fast) alles machen und innovative/neuartige Analyseansätze sind in der Regel zeitnah in R verfügbar. Somit ist R eine Art Schweizer Taschenmesser. Die Vielfalt der Möglichkeiten ist über sogenannte Pakete möglich. Spezifische Verfahren und Analyseansätze werden in solchen Paketen zur Verfügung gestellt, wobei jede Person Pakete entwickeln und veröffentlichen kann. R ist daher ein „Community Project“ und jede Person kann dazu beitragen.

Nachfolgend wird die Funktionalität von einigen Paketen exemplarisch aufgezeigt.

Netzwerkanalyse: visNetwork & igraph

Text Mining: tidytext

Item Response Theory (z.B. Rasch): TAM

Geospatial Data Analysis: sf

Schweizer Taschenmesser: SPSS (links) und R (rechts)

Open Source, Community & Support

„R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. R is a collaborative project with many contributors.“

R ist nicht nur kostenlos und frei verfügbar, sondern verfügt auch über eine große, aktive und hilfsbereite Community sowie umfangreiche Dokumentation, Tutorials und kostenlose Lernressourcen.

Weitere Infos auf der R-Startseite: https://www.r-project.org/

Zudem gehört R zu den beliebtesten Programmiersprachen weltweit und belegt zurzeit (April 2025) Platz 6 gemäß PYPL-Index (PopularitY of Programming Language). Es gibt aber auch weitere frei verfügbare Programmiersprachen für den vielfältigen Einsatz im Bereich der statistischen Datenanalyse, z.B. Python und Julia.

PYPL PopularitY of Programming Language Index (April 2025): https://pypl.github.io/PYPL.html
Rank	Language	Share	1-year trend
1	Python	30.27 %	+1.4 %
2	Java	15.04 %	-0.6 %
3	JavaScript	7.93 %	-0.7 %
4	C/C++	6.99 %	+0.6 %
5	C#	6.2 %	-0.6 %
6	R	4.59 %	+0.0 %
7	PHP	3.74 %	-0.8 %
8	Rust	3.14 %	+0.5 %
9	TypeScript	2.79 %	-0.1 %
10	Objective-C	2.72 %	+0.3 %

Installation: R und Rstudio

https://posit.co/download/rstudio-desktop/

Wichtig

Erklären was posit ist
R und Rstudio erklären
In Zukunft wird Rstudio von Positron abgelöst: https://positron.posit.co/

Ismay, Kim, & Valdivia (2025): Analogy of difference between R and RStudio (https://moderndive.com/v2/getting-started.html)

Ismay, Kim, & Valdivia (2025): Icons of R versus RStudio on your computer (https://moderndive.com/v2/getting-started.html)

Wir starten RStudio

Wichtig

Script öffnen

Arithmetik und Objekte

Operator	Beschreibung	Beispiel	Ergebnis
`+`	Addition	`5 + 3`	`8`
`-`	Subtraktion	`10 – 5`	`5`
`*`	Multiplikation	`3 * 5`	`15`
`/`	Division	`10 / 2`	`5`
`^`	Exponentiation	`2^3`	`8`

Body mass index (BMI) is a value derived from the mass (weight) and height of a person. The BMI is defined as the body mass divided by the square of the body height, and is expressed in units of kg/m², resulting from mass in kilograms (kg) and height in metres (m). https://en.wikipedia.org/wiki/Body_mass_index

BMI = kg/m²

Wichtig

Objekte anlegen und BMI berechnen:

Augenfarbe (AUGENFARBE)
Gewicht in kg (GEWICHT_IN_KG)
Größe in m (GROESSE_IN_M)
BMI

Code

AUGENFARBE <- "Blaugrau"
GEWICHT_IN_KG <- 84
GROESSE_IN_M <- 1.80
# BMI-Berechnung 
GEWICHT_IN_KG / GROESSE_IN_M^2

[1] 25.92593

Code

BMI <- GEWICHT_IN_KG / GROESSE_IN_M^2 # BMI-Berechnung 
BMI

[1] 25.92593

Code

BMI * AUGENFARBE
# Error in BMI * AUGENFARBE : non-numeric argument to binary operator

Vektoren und Datensätze

NAME <-
  c("Susi", "Ali", "Klaus")

AUGENFARBE <-
  c("Blaugrau", "Braun", "Grün")

GEWICHT_IN_KG <-
  c(84, 65, 72)

GROESSE_IN_M <-
  c(1.80, 1.65, 1.75)

DATA <-
  data.frame(NAME, AUGENFARBE, GEWICHT_IN_KG, GROESSE_IN_M)

DATA <-
  data.frame(NAME,
             AUGENFARBE,
             GEWICHT_IN_KG,
             GROESSE_IN_M)

DATA

   NAME AUGENFARBE GEWICHT_IN_KG GROESSE_IN_M
1  Susi   Blaugrau            84         1.80
2   Ali      Braun            65         1.65
3 Klaus       Grün            72         1.75

Übung

Ergänzen Sie den Datensatz um 2 Personen und erstellen Sie eine zusätzliche Variable (z.B. Anzahl der Haustiere). Speichern Sie den Datensatz als Objekt DATA.

Lösung

NAME <-
  c("Susi", "Ali", "Klaus", "Oleg", "Rudi")

AUGENFARBE <-
  c("Blaugrau", "Braun", "Grün", "Grün", "Grau")

GEWICHT_IN_KG <-
  c(84, 65, 72, 66, 70)

GROESSE_IN_M <-
  c(1.80, 1.65, 1.75, 1.63, 2.10)

ANZAHL_HAUSTIERE <-
  c(2, 0, 0, 1, 0)

DATA <-
  data.frame(NAME, AUGENFARBE, GEWICHT_IN_KG, GROESSE_IN_M, ANZAHL_HAUSTIERE)

DATA <-
  data.frame(NAME,
             AUGENFARBE,
             GEWICHT_IN_KG,
             GROESSE_IN_M,
             ANZAHL_HAUSTIERE)

DATA

   NAME AUGENFARBE GEWICHT_IN_KG GROESSE_IN_M ANZAHL_HAUSTIERE
1  Susi   Blaugrau            84         1.80                2
2   Ali      Braun            65         1.65                0
3 Klaus       Grün            72         1.75                0
4  Oleg       Grün            66         1.63                1
5  Rudi       Grau            70         2.10                0

DATA$NAME

[1] "Susi"  "Ali"   "Klaus" "Oleg"  "Rudi"

DATA$BMI <-
  DATA$GEWICHT_IN_KG / DATA$GROESSE_IN_M^2 # BMI-Berechnung

DATA

   NAME AUGENFARBE GEWICHT_IN_KG GROESSE_IN_M ANZAHL_HAUSTIERE      BMI
1  Susi   Blaugrau            84         1.80                2 25.92593
2   Ali      Braun            65         1.65                0 23.87511
3 Klaus       Grün            72         1.75                0 23.51020
4  Oleg       Grün            66         1.63                1 24.84098
5  Rudi       Grau            70         2.10                0 15.87302

DATA[3, 2] # 3. Person, 2. Variable

[1] "Grün"

DATA[3, "GEWICHT_IN_KG"]

[1] 72

DATA[c(2,4), c("NAME", "GEWICHT_IN_KG")]

  NAME GEWICHT_IN_KG
2  Ali            65
4 Oleg            66

Pakete installieren und laden (Beispiel tidyverse, writexl & gt)

Das Tidyverse ist eine Sammlung von R-Paketen für das Datenmanagement und die Analyse von Daten. Das Tidyverse ist heutzutage zunehmend der Standard in der R-Welt: https://www.tidyverse.org/

Mit der Installation sind u.a. folgende Pakete verfügbar:

ggplot2: Erstellt schöne Grafiken und Diagramme
dplyr: Vereinfacht die Datenmanipulation (filtern, sortieren, zusammenfassen)
tidyr: Hilft beim Aufräumen und Umstrukturieren von Daten
readr: Importiert Textdateien (wie CSV)
tibble: Eine moderne Version des Dataframes
stringr: Vereinfacht die Arbeit mit Texten
forcats: Hilft bei der Arbeit mit kategorialen Variablen
lubridate: Vereinfacht die Arbeit mit Datums- und Zeitwerten
readxl: Ermöglicht den Import von Excel-Dateien
haven: Importiert Daten aus SPSS, Stata und SAS

Das tidyverse enthält aber leider kein Paket für den Datenexport im Excel-Format. Daher installieren wir auch das Paket writexl: https://docs.ropensci.org/writexl/

gt ist ein Paket für die Gestaltung von Tabellen: https://gt.rstudio.com/

# Ein Paket muss nach der Installation mit dem library Befehl aktiviert werden (einmalig bei jeder R-Sitzung):

library(writexl)
library(gt)

# Nach der Installation des tidyverse:
# Immer wenn wir mit R arbeiten, müssen die Pakete des tidyverse aktiviert werden (einmalig bei jeder R-Sitzung).
# Die allermeisten Pakete des tidyverse werden somit verfügbar gemacht.
# Einzelne Pakete des tidyverse müssen aber separat aktiviert werden (z.B. haven & readxl).

library(tidyverse) 
library(haven)
library(readxl)

Daten-Export und Daten-Import ins Arbeitsverzeichnis

Export (Excel)

Wichtig

Skript in einem Arbeitsordner (Projektordner) speichern

library(writexl)
write_xlsx(DATA, path = "test.xlsx")
write_xlsx(DATA, path = "C:\\Users\\Graduiertenschule\\Desktop\\test.xlsx")
write_xlsx(DATA, "test.xlsx")
writexl::write_xlsx(DATA, "test.xlsx") # Double Colon Operator: paket::funktion()

getwd() # Get Working Directory

[1] "C:/Users/Graduiertenschule/Dropbox/GSLB/R Workshop"

# setwd("C:\\Users\\Graduiertenschule\\Desktop") # Set Working Directory

Import (Excel)

Wichtig

Skript in einem Arbeitsordner (Projektordner) speichern

library(readxl) # https://readxl.tidyverse.org/
read_xlsx("test.xlsx")

# A tibble: 5 × 6
  NAME  AUGENFARBE GEWICHT_IN_KG GROESSE_IN_M ANZAHL_HAUSTIERE   BMI
  <chr> <chr>              <dbl>        <dbl>            <dbl> <dbl>
1 Susi  Blaugrau              84         1.8                 2  25.9
2 Ali   Braun                 65         1.65                0  23.9
3 Klaus Grün                  72         1.75                0  23.5
4 Oleg  Grün                  66         1.63                1  24.8
5 Rudi  Grau                  70         2.1                 0  15.9

NEW_DATA <- read_xlsx("test.xlsx")

Import (SPSS)

Datenquelle (SPSS)

Díez-Palomar J, García-Carrión R, Hargreaves L, Vieites M (2020) Transforming students’ attitudes towards learning through the use of successful educational actions. PLoS ONE 15(10): e0240292.

https://doi.org/10.1371/journal.pone.0240292 (Paper)
https://doi.org/10.1371/journal.pone.0240292.s002 (SPSS-Datensatz pone.0240292.s002.sav)

library(tidyverse) # https://tidyverse.tidyverse.org/
library(haven) # https://haven.tidyverse.org/
library(gt) # https://gt.rstudio.com/

DATA_SPSS <- read_sav("pone.0240292.s002.sav")
DATA_SPSS <- read_sav("https://doi.org/10.1371/journal.pone.0240292.s002")

DATA_SPSS$Q1

<labelled<double>[418]>: 1) We learn best when the teacher tells us what to do
  [1]  2  1  1  1  3  2  1  3  2  4  2  1  1  1  2  1  1  2  1  1  1  2  1  3  1
 [26]  2  3  3  1  4  1  2  2  2  4  2  1  2  1  4  2  1  1  3  2  2  5  2  2  1
 [51]  3  1  2  2  1  1  1  1  3  3  2  2  2  1  2  1  1  2  1  1  2  2  2  3  3
 [76]  1  2  2  1  5  2  3  1  2  1  2  3  4  2  1  4  1  1  1  5  1  3  2  2  1
[101]  2  1  4  2  2  2  1  5  2 NA  3  3  1  3  2  1  2  2  3  2  1  2  1  1  1
[126]  1  1  1  2  2  1  1  1  2  2  1  1  1  1  1  1  1  1  1  1 NA  2  3  1  1
[151]  2  2  1  2  1  1  1  1  1  1 NA  1  3  1  5  1  3  1  5  2  2  2  1  2  1
[176]  3  1  2  1  1  1  2 NA  1  2  1  1  1  1  1  3  1  1  5  1  1  1  3  1  1
[201]  1  2  2  1  3  1  3  1  1  1  4  1  1  3  2  3  1  1  1  1  3  1  5  3  1
[226]  1  5  1  4  2  2  4  2  5  3  1  5  3  3  3  5  4  3  3  2  2  2  1  4  5
[251] NA  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  1  1  1  1  1  1  2  1
[276]  1  1  1  1  1  1  1  2  1  1  1  2  1  1  1  1  1  1  1  1  1  1  1  1  1
[301]  1  1  1  1  1  3  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  1
[326]  1  1  1  1  1  1  1  1  1  1  2  1  1  1  1  4  1  2  1  1  1  1  1  1  5
[351]  1  1  1  1  2  1  1  1  5  1  1  1  1 NA  1  1  1  1  1  3  1  1  1  2  1
[376]  1  1  1  1  1  1  4  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[401]  2 NA  5 NA NA  3 NA  1  3 NA  4  2  1  4  1  1 NA  5

Labels:
 value             label
     0         Undefined
     1    Strongly agree
     2    Agree a little
     3          Not sure
     4 Disagree a little
     5 Strongly disagree

DATA_SPSS %>%
  mutate(Q1 = Q1 %>% as_factor()) %>% 
  count(Q1)

# A tibble: 6 × 2
  Q1                    n
  <fct>             <int>
1 Strongly agree      252
2 Agree a little       81
3 Not sure             40
4 Disagree a little    16
5 Strongly disagree    17
6 <NA>                 12

DATA_SPSS %>%
  mutate(Q1 = Q1 %>% as_factor()) %>% 
  count(Q1) %>%
  gt() %>%
  tab_options(table.align = "left")

1) We learn best when the teacher tells us what to do	n
Strongly agree	252
Agree a little	81
Not sure	40
Disagree a little	16
Strongly disagree	17
NA	12

DATA_SPSS %>%
  mutate(Q1 = Q1 %>% as_factor()) %>% 
  count(Q1) %>%
  ggplot(aes(x = n, y = Q1)) +
  geom_col() +
  labs(y = attr(DATA_SPSS$Q1, "label")) +
  scale_y_discrete(limits = rev)

NA: Not Available / Missing Values

Univariate deskriptive Statistik

Datenquelle

Imuta K, Scarf D, Pharo H, Hayne H (2013) Drawing a Close to the Use of Human Figure Drawings as a Projective Measure of Intelligence. PLoS ONE 8(3): e58991.

https://doi.org/10.1371/journal.pone.0058991 (Paper)
https://doi.org/10.1371/journal.pone.0058991.s001 (Daten im Word-Format)
https://github.com/PawelKulawiak/rworkshop/blob/main/DATA_doi_10.1371_journal.pone.0058991.xlsx (Daten im Excel-Format)
https://github.com/PawelKulawiak/rworkshop/raw/refs/heads/main/DATA_doi_10.1371_journal.pone.0058991.xlsx (Direkter Download: Daten im Excel-Format)

Variablen:

DAP_IQ: Draw-A-Person Intellectual Ability Test (Mann-Zeichen-Test)
WPPSI: Wechsler Preschool and Primary Scale of Intelligence

library(readxl)
DATA <- read_xlsx("DATA_doi_10.1371_journal.pone.0058991.xlsx")

Mittelwert, Standardabweichung und Co.

DATA$DAP_IQ

  [1]  67  72  73  79  79  83  83  84  88  89  89  92  92  93  93  93  93  93
 [19]  93  96  96  96  96  97  97  97  97  97  97  97  97  97  97  98  98  98
 [37] 100 101 101 102 103 103 106 106 106 106 106 107 108 108 108 108 109 110
 [55] 110 110 110 111 111 111 111 111 111 111 113 114 114 115 115 116 116 116
 [73] 116 117 119 119 119 119 119 119 120 121 121 121 121 121 121 122 122 122
 [91] 122 123 123 124 124 125 135 139 140 142

mean(DATA$DAP_IQ, na.rm = TRUE) # NA remove: remove missing values (NA: Not Available / Missing Values)

[1] 106.56

mean(DATA$DAP_IQ, na.rm = T, trim = 0.1) # trim: Getrimmter Mittelwert

[1] 107.025

round(mean(DATA$DAP_IQ, na.rm = T, trim = 0.1))

[1] 107

round(mean(DATA$DAP_IQ, na.rm = T, trim = 0.1), 2)

[1] 107.03

MW_DAP_IQ <- round(mean(DATA$DAP_IQ, na.rm = T, trim = 0.1))

round(MW_DAP_IQ, 2)

[1] 107

DATA$DAP_IQ %>%
  mean(na.rm = T, trim = 0.1)

[1] 107.025

DATA$DAP_IQ %>%
  mean(na.rm = T, trim = 0.1) %>% 
  round()

[1] 107

DATA$DAP_IQ %>%
  mean(na.rm = T, trim = 0.1) %>% 
  round(2)

[1] 107.03

Übung

Bestimmen Sie die Standardabweichung von DAP_IQ & WPPSIund runden Sie die Ergebnisse auf 2 Nachkommastellen. Nutzen Sie dabei die Pipe %>%.

Lösung

DATA$DAP_IQ %>%
  sd(na.rm = T) %>% 
  round(2)

[1] 14.65

DATA$WPPSI %>%
  sd(na.rm = T) %>% 
  round(2)

[1] 11.45

Der Tidyverse-Weg

DATA %>%
  summarise(
    M = mean(DAP_IQ),
    SD = sd(DAP_IQ),
    n = n()
            )

# A tibble: 1 × 3
      M    SD     n
  <dbl> <dbl> <int>
1  107.  14.6   100

DATA %>%
  summarise(
    M = mean(DAP_IQ),
    SD = sd(DAP_IQ),
    n = n()
            ) %>%
  gt() # Eine ästhetische Darstellung der Tabelle erfolgt mit gt() aus dem gt Paket

M	SD	n
106.56	14.64828	100

DATA %>%
  summarise(
    M = mean(DAP_IQ),
    SD = sd(DAP_IQ),
    n = n()
            ) %>%
  round(2) %>%
  gt()

M	SD	n
106.56	14.65	100

DATA %>%
  summarise(
    M = mean(DAP_IQ) %>% round(2),
    SD = sd(DAP_IQ) %>% round(2),
    n = n()
            ) %>%
  gt() %>%
  tab_header("Draw-A-Person Intellectual Ability Test", "Descriptive Statistics")

M	SD	n
Draw-A-Person Intellectual Ability Test
Descriptive Statistics
106.56	14.65	100

Übung

Erstellen Sie mit summarise() jeweils eine Tabelle für DAP_IQ und WPPSI, mit den folgenden Parametern:

Mittelwert
Median
Minimum
Maximum
Standardabweichung

Runden Sie die Ergebnisse auf 2 Nachkommastellen. Nutzen Sie für die Darstellung der Tabellen den Befehl gt() und wählen Sie sinnvolle Überschriften für die Tabellen (tab_header()). Nutzen Sie die Pipe %>%.

Lösung

DATA %>%
  summarise(
    M = mean(DAP_IQ),
    Md = median(DAP_IQ),
    SD = sd(DAP_IQ),
    Min = min(DAP_IQ),
    Max = max(DAP_IQ),
    n = n()
            ) %>%
  round(2) %>%
  gt() %>%
  tab_header("Draw-A-Person Intellectual Ability Test", "Descriptive Statistics")

M	Md	SD	Min	Max	n
Draw-A-Person Intellectual Ability Test
Descriptive Statistics
106.56	108	14.65	67	142	100

DATA %>%
  summarise(
    M = mean(WPPSI),
    Md = median(WPPSI),
    SD = sd(WPPSI),
    Min = min(WPPSI),
    Max = max(WPPSI),
    n = n()
            ) %>%
  round(2) %>%
  gt() %>%
  tab_header("Wechsler Preschool and Primary Scale of Intelligence", "Descriptive Statistics")

M	Md	SD	Min	Max	n
Wechsler Preschool and Primary Scale of Intelligence
Descriptive Statistics
104.32	107	11.45	72	128	100

Visualisierung (mit ggplot2)

Self-portrait by Vincent van Ggplot II (1889)

Self-portrait by Vincent van Ggplot II (1888)

Boxplot, Stripchart mit Jitter & Violinplot

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = ""))

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot()

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot() +
  geom_point()

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_point()

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_point() +
  geom_boxplot()

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot() +
  geom_point() +
  ylab(NULL)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot() +
  geom_point() +
  ylab(NULL) +
  xlim(c(0,200))

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(color = "red") +
  geom_point() +
  ylab(NULL)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(color = "red", fill = "lightblue") +
  geom_point() +
  ylab(NULL)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(color = "red", fill = "lightblue") +
  geom_point(color = "green3") + # https://r-charts.com/colors/
  ylab(NULL)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD") + # https://g.co/kgs/XambkJu
  geom_point(color = "gray25") +
  ylab(NULL)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD") + 
  geom_point(color = "gray25") +
  ylab(NULL) +
  theme_classic() # https://ggplot2.tidyverse.org/reference/ggtheme.html

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD") + 
  geom_point(color = "gray25") +
  ylab(NULL) +
  theme_minimal()

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD") + 
  geom_point(color = "gray25") +
  stat_summary(fun = mean, geom = "point", color = "blue", fill = "red", shape = 23, size = 3) + # https://www.sthda.com/english/wiki/ggplot2-point-shapes
  ylab(NULL) +
  theme_minimal(base_size = 15)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD") + 
  geom_jitter(color = "gray25", width = 0, height = 0.075) +
  stat_summary(fun = mean, geom = "point", color = "blue", fill = "red", shape = 23, size = 3) + 
  ylab(NULL) +
  theme_minimal(base_size = 15)

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD", width = 0.3) + 
  geom_jitter(color = "gray25", width = 0, height = 0.075) +
  stat_summary(fun = mean, geom = "point", color = "blue", fill = "red", shape = 23, size = 3) + 
  ylab(NULL) +
  theme_minimal(base_size = 15)

Boxplot-Dokumentation: https://ggplot2.tidyverse.org/reference/geom_boxplot.html

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD", width = 0.4) + 
  geom_jitter(color = "gray25", width = 0, height = 0.075) +
  stat_summary(fun = mean, geom = "point", color = "blue", fill = "red", shape = 23, size = 3) + 
  ylab(NULL) +
  theme_minimal(base_size = 12) +
  xlab("Score") +
  labs(
    title = "Draw-A-Person Intellectual Ability Test",
    subtitle = "Mann-Zeichen-Test",
    caption = "Data from https://doi.org/10.1371/journal.pone.0058991",
    tag = "A)"
  )

Code

DATA %>%
  summarise(
    Min = min(DAP_IQ),
    M = mean(DAP_IQ),
    Md = median(DAP_IQ),
    Max = max(DAP_IQ),
    SD = sd(DAP_IQ),
    n = n()
            ) %>%
  round(2) %>%
  gt() %>%
  tab_options(table.align = "left", table.width = pct(45)) %>%
  tab_header("Descriptive Statistics", "Draw-A-Person Intellectual Ability Test")

Min	M	Md	Max	SD	n
Descriptive Statistics
Draw-A-Person Intellectual Ability Test
67	106.56	108	142	14.65	100

Code

library(patchwork) # https://patchwork.data-imaginist.com/

BOXPLOT <-
  DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  geom_boxplot(fill = "#78C2AD", width = 0.4) + 
  geom_jitter(color = "gray25", width = 0, height = 0.075) +
  stat_summary(fun = mean, geom = "point", color = "blue", fill = "red", shape = 23, size = 3) + 
  ylab(NULL) +
  theme_minimal(base_size = 12) +
  xlab("Score") +
  labs(
    title = "Draw-A-Person Intellectual Ability Test",
    subtitle = "Mann-Zeichen-Test",
    tag = "A)"
  )

TABLE <-
  DATA %>%
  summarise(
    Min = min(DAP_IQ),
    M = mean(DAP_IQ),
    Md = median(DAP_IQ),
    Max = max(DAP_IQ),
    SD = sd(DAP_IQ),
    n = n()
            ) %>%
  round(2) %>%
  gt() %>%
  tab_options(table.align = "left", table.width = px(520)) %>% 
  tab_header("Descriptive Statistics", "Draw-A-Person Intellectual Ability Test") %>%
  tab_footnote("Data from https://doi.org/10.1371/journal.pone.0058991")

BOXPLOT / TABLE # Boxplot und Tabelle untereinander anzeigen

Übung (Boxplot)

Erstellen Sie einen Boxplot für WPPSI. Wählen Sie sinnvolle Beschriftungen/Überschriften. Ändern Sie die Farbe der Box und des Mittelwertes. Wählen Sie eine andere Darstellungsform für den Mittelwert (shape)

Verbinden Sie den Boxplot mit einem Violinplot:

Lösung

DATA %>% 
  ggplot(aes(x = WPPSI, y = "")) +
  geom_boxplot(fill = "lightblue") + 
  geom_jitter(color = "gray25", width = 0, height = 0.075) +
  stat_summary(fun = mean, geom = "point", color = "purple", shape = 18, size = 3) + 
  ylab(NULL) +
  theme_minimal(base_size = 15) +
  ggtitle("Wechsler Preschool and Primary Scale of Intelligence") +
  xlab("Score") +
  labs(caption = "Data from https://doi.org/10.1371/journal.pone.0058991")

DATA %>% 
  ggplot(aes(x = WPPSI, y = "")) +
  geom_violin(fill = "lightblue") + 
  geom_boxplot(fill = "lightgreen", width = 0.2) + 
  geom_jitter(color = "gray25", width = 0, height = 0.075) +
  stat_summary(fun = mean, geom = "point", color = "purple", shape = 18, size = 3) + 
  ylab(NULL) +
  theme_minimal(base_size = 15) +
  ggtitle("Wechsler Preschool and Primary Scale of Intelligence") +
  xlab("Score") +
  labs(caption = "Data from https://doi.org/10.1371/journal.pone.0058991")

Raincloud Plot

Quelle: https://www.cedricscherer.com/2021/06/06/visualizing-distributions-with-raincloud-plots-and-how-to-create-them-with-ggplot2/

library(ggdist) # https://mjskay.github.io/ggdist/

DATA %>% 
  ggplot(aes(x = DAP_IQ, y = "")) +
  stat_slab(justification = -.2, fill = "#78C2AD") +
  geom_boxplot(width = .15, fill = "#78C2AD") +
  stat_summary(fun = mean, geom = "point", color = "purple", shape = 18, size = 3) +
  stat_dots(side = "left", justification = 1.2, color = "#78C2AD", fill = "#78C2AD") +
  ylab(NULL) +
  theme_minimal()

DATA %>% 
  ggplot(aes(x = WPPSI, y = "")) +
  stat_slab(justification = -.2, fill = "#78C2AD") +
  geom_boxplot(width = .15, fill = "#78C2AD") +
  stat_summary(fun = mean, geom = "point", color = "purple", shape = 18, size = 3) +
  stat_dots(side = "left", justification = 1.2, color = "#78C2AD", fill = "#78C2AD") +
  ylab(NULL) +
  theme_minimal()

Weiterführende Links zum Thema Visualisierung (mit ggplot2)

Datenmanagement

Transformieren, auswählen und filtern (mutate, select und filter)

NAME <-
  c("Susi", "Ali", "Klaus", "Oleg", "Rudi")

AUGENFARBE <-
  c("Blaugrau", "Braun", "Grün", "Grün", "Grau")

GEWICHT_IN_KG <-
  c(84, 65, 72, 66, 70)

GROESSE_IN_M <-
  c(1.80, 1.65, 1.75, 1.63, 2.10)

ANZAHL_HAUSTIERE <-
  c(2, 0, 0, 1, 0)

DATA_2 <-
  data.frame(NAME,
             AUGENFARBE,
             GEWICHT_IN_KG,
             GROESSE_IN_M,
             ANZAHL_HAUSTIERE)

DATA_2 %>%
  gt()

NAME	AUGENFARBE	GEWICHT_IN_KG	GROESSE_IN_M	ANZAHL_HAUSTIERE
Susi	Blaugrau	84	1.80	2
Ali	Braun	65	1.65	0
Klaus	Grün	72	1.75	0
Oleg	Grün	66	1.63	1
Rudi	Grau	70	2.10	0

DATA_2$BMI <-
  DATA_2$GEWICHT_IN_KG / DATA_2$GROESSE_IN_M^2 # BMI-Berechnung

DATA_2 %>%
  gt()

NAME	AUGENFARBE	GEWICHT_IN_KG	GROESSE_IN_M	ANZAHL_HAUSTIERE	BMI
Susi	Blaugrau	84	1.80	2	25.92593
Ali	Braun	65	1.65	0	23.87511
Klaus	Grün	72	1.75	0	23.51020
Oleg	Grün	66	1.63	1	24.84098
Rudi	Grau	70	2.10	0	15.87302

DATA_2 %>%
  mutate(bmi = GEWICHT_IN_KG / GROESSE_IN_M^2) %>% # BMI-Berechnung
  gt()

NAME	AUGENFARBE	GEWICHT_IN_KG	GROESSE_IN_M	ANZAHL_HAUSTIERE	BMI	bmi
Susi	Blaugrau	84	1.80	2	25.92593	25.92593
Ali	Braun	65	1.65	0	23.87511	23.87511
Klaus	Grün	72	1.75	0	23.51020	23.51020
Oleg	Grün	66	1.63	1	24.84098	24.84098
Rudi	Grau	70	2.10	0	15.87302	15.87302

DATA_2 %>%
  gt()

NAME	AUGENFARBE	GEWICHT_IN_KG	GROESSE_IN_M	ANZAHL_HAUSTIERE	BMI
Susi	Blaugrau	84	1.80	2	25.92593
Ali	Braun	65	1.65	0	23.87511
Klaus	Grün	72	1.75	0	23.51020
Oleg	Grün	66	1.63	1	24.84098
Rudi	Grau	70	2.10	0	15.87302

DATA_2 <-
  DATA_2 %>%
  mutate(bmi = GEWICHT_IN_KG / GROESSE_IN_M^2)

DATA_2 %>%
  gt()

NAME	AUGENFARBE	GEWICHT_IN_KG	GROESSE_IN_M	ANZAHL_HAUSTIERE	BMI	bmi
Susi	Blaugrau	84	1.80	2	25.92593	25.92593
Ali	Braun	65	1.65	0	23.87511	23.87511
Klaus	Grün	72	1.75	0	23.51020	23.51020
Oleg	Grün	66	1.63	1	24.84098	24.84098
Rudi	Grau	70	2.10	0	15.87302	15.87302

DATA_2[c(2,4), c("NAME", "GEWICHT_IN_KG")] %>%
  gt()

NAME	GEWICHT_IN_KG
Ali	65
Oleg	66

DATA_2 %>%
  slice(2, 4) %>%
  select(NAME, GEWICHT_IN_KG) %>%
  gt()

NAME	GEWICHT_IN_KG
Ali	65
Oleg	66

DATA_KEINE_HAUSTIERE <-
  DATA_2 %>%
  filter(ANZAHL_HAUSTIERE == 0) %>%
  select(NAME, GEWICHT_IN_KG, ANZAHL_HAUSTIERE)

DATA_KEINE_HAUSTIERE %>%
  gt()

NAME	GEWICHT_IN_KG	ANZAHL_HAUSTIERE
Ali	65	0
Klaus	72	0
Rudi	70	0

DATA_HAUSTIERE <-
  DATA_2 %>%
  filter(ANZAHL_HAUSTIERE > 0) %>%
  select(NAME, GEWICHT_IN_KG, ANZAHL_HAUSTIERE, AUGENFARBE)

DATA_HAUSTIERE %>%
  gt()

NAME	GEWICHT_IN_KG	ANZAHL_HAUSTIERE	AUGENFARBE
Susi	84	2	Blaugrau
Oleg	66	1	Grün

DATA_HAUSTIERE <-
  DATA_2 %>%
  filter(ANZAHL_HAUSTIERE > 0 & AUGENFARBE == "Grün") %>%
  select(NAME, GEWICHT_IN_KG, ANZAHL_HAUSTIERE, AUGENFARBE)

DATA_HAUSTIERE %>%
  gt()

NAME	GEWICHT_IN_KG	ANZAHL_HAUSTIERE	AUGENFARBE
Oleg	66	1	Grün

Logische Operatoren

Operator	Bedeutung
`==`	gleich
`!=`	ungleich
`>`	größer als
`<`	kleiner als
`>=`	größer-gleich
`<=`	kleiner-gleich
`&`	und
`\|`	oder

LLM-Hilfe

DATA <- DATA %>%
  mutate(DAP_IQ_CATEGORY = case_when(
    DAP_IQ > 115 ~ "High",
    DAP_IQ >= 85 & DAP_IQ <= 115 ~ "Average",
    DAP_IQ < 85 ~ "Low",
    TRUE ~ NA_character_  # Handles any NA values or other unexpected cases
  ))

DATA %>% 
  count(DAP_IQ_CATEGORY) %>%
  gt()

DAP_IQ_CATEGORY	n
Average	61
High	31
Low	8

Übung

Lassen Sie sich folgende Datensätze anzeigen (filter()):

DAP_IQ > 115
DAP_IQ > 115 und WPPSI > 115
Lassen Sie sich Kind A anzeigen

Lösung

DATA %>% 
  filter(DAP_IQ > 115) %>%
  gt()

Participant	DAP_IQ	WPPSI	DAP_IQ_CATEGORY
70	116	95	High
71	116	108	High
72	116	110	High
73	116	115	High
74	117	115	High
75	119	74	High
76	119	97	High
77	119	103	High
78	119	108	High
79	119	108	High
80	119	115	High
81	120	120	High
82	121	100	High
83	121	107	High
84	121	107	High
85	121	108	High
86	121	118	High
87	121	118	High
88	122	100	High
89	122	107	High
90	122	108	High
91	122	123	High
92	123	87	High
93	123	121	High
94	124	95	High
95	124	108	High
96	125	121	High
97	135	95	High
98	139	128	High
99	140	108	High
100	142	118	High

DATA %>% 
  filter(DAP_IQ > 115, WPPSI > 115) %>%
  gt()

Participant	DAP_IQ	WPPSI	DAP_IQ_CATEGORY
81	120	120	High
86	121	118	High
87	121	118	High
91	122	123	High
93	123	121	High
96	125	121	High
98	139	128	High
100	142	118	High

DATA %>% 
  filter(DAP_IQ > 115 & WPPSI > 115) %>%
  gt()

Participant	DAP_IQ	WPPSI	DAP_IQ_CATEGORY
81	120	120	High
86	121	118	High
87	121	118	High
91	122	123	High
93	123	121	High
96	125	121	High
98	139	128	High
100	142	118	High

DATA %>% 
  filter(DAP_IQ == 119 & WPPSI == 74)

# A tibble: 1 × 4
  Participant DAP_IQ WPPSI DAP_IQ_CATEGORY
        <dbl>  <dbl> <dbl> <chr>          
1          75    119    74 High

DATA %>% 
  filter(DAP_IQ != 119 & WPPSI != 74)

# A tibble: 94 × 4
   Participant DAP_IQ WPPSI DAP_IQ_CATEGORY
         <dbl>  <dbl> <dbl> <chr>          
 1           1     67    85 Low            
 2           2     72   107 Low            
 3           3     73   102 Low            
 4           4     79    95 Low            
 5           5     79   108 Low            
 6           6     83    97 Low            
 7           7     83   103 Low            
 8           8     84   113 Low            
 9           9     88    88 Average        
10          10     89    72 Average        
# ℹ 84 more rows

DATA %>%
  filter(!(DAP_IQ == 119 & WPPSI == 74))

# A tibble: 99 × 4
   Participant DAP_IQ WPPSI DAP_IQ_CATEGORY
         <dbl>  <dbl> <dbl> <chr>          
 1           1     67    85 Low            
 2           2     72   107 Low            
 3           3     73   102 Low            
 4           4     79    95 Low            
 5           5     79   108 Low            
 6           6     83    97 Low            
 7           7     83   103 Low            
 8           8     84   113 Low            
 9           9     88    88 Average        
10          10     89    72 Average        
# ℹ 89 more rows

Übung

Erstellen Sie eine neue Variable (mutate()), die die Differenz zwischen DAP_IQ und WPPSI anzeigt.
Welche Kinder haben identische Werte (DAP_IQ und WPPSI)? Erstellen Sie eine entsprechende Variable (mutate()).

Lösung

DATA %>% 
  mutate(DIFF = DAP_IQ - WPPSI)

# A tibble: 100 × 5
   Participant DAP_IQ WPPSI DAP_IQ_CATEGORY  DIFF
         <dbl>  <dbl> <dbl> <chr>           <dbl>
 1           1     67    85 Low               -18
 2           2     72   107 Low               -35
 3           3     73   102 Low               -29
 4           4     79    95 Low               -16
 5           5     79   108 Low               -29
 6           6     83    97 Low               -14
 7           7     83   103 Low               -20
 8           8     84   113 Low               -29
 9           9     88    88 Average             0
10          10     89    72 Average            17
# ℹ 90 more rows

DATA %>% 
  mutate(DIFF = DAP_IQ - WPPSI) %>%
  mutate(GLEICH = DIFF == 0)

# A tibble: 100 × 6
   Participant DAP_IQ WPPSI DAP_IQ_CATEGORY  DIFF GLEICH
         <dbl>  <dbl> <dbl> <chr>           <dbl> <lgl> 
 1           1     67    85 Low               -18 FALSE 
 2           2     72   107 Low               -35 FALSE 
 3           3     73   102 Low               -29 FALSE 
 4           4     79    95 Low               -16 FALSE 
 5           5     79   108 Low               -29 FALSE 
 6           6     83    97 Low               -14 FALSE 
 7           7     83   103 Low               -20 FALSE 
 8           8     84   113 Low               -29 FALSE 
 9           9     88    88 Average             0 TRUE  
10          10     89    72 Average            17 FALSE 
# ℹ 90 more rows

DATA %>% 
  mutate(DIFF = DAP_IQ - WPPSI) %>%
  mutate(GLEICH = case_when(DIFF == 0 ~ "JA",
                            DIFF != 0 ~ "NEIN"))

# A tibble: 100 × 6
   Participant DAP_IQ WPPSI DAP_IQ_CATEGORY  DIFF GLEICH
         <dbl>  <dbl> <dbl> <chr>           <dbl> <chr> 
 1           1     67    85 Low               -18 NEIN  
 2           2     72   107 Low               -35 NEIN  
 3           3     73   102 Low               -29 NEIN  
 4           4     79    95 Low               -16 NEIN  
 5           5     79   108 Low               -29 NEIN  
 6           6     83    97 Low               -14 NEIN  
 7           7     83   103 Low               -20 NEIN  
 8           8     84   113 Low               -29 NEIN  
 9           9     88    88 Average             0 JA    
10          10     89    72 Average            17 NEIN  
# ℹ 90 more rows

DATA %>% 
  mutate(GLEICH = DAP_IQ == WPPSI)

# A tibble: 100 × 5
   Participant DAP_IQ WPPSI DAP_IQ_CATEGORY GLEICH
         <dbl>  <dbl> <dbl> <chr>           <lgl> 
 1           1     67    85 Low             FALSE 
 2           2     72   107 Low             FALSE 
 3           3     73   102 Low             FALSE 
 4           4     79    95 Low             FALSE 
 5           5     79   108 Low             FALSE 
 6           6     83    97 Low             FALSE 
 7           7     83   103 Low             FALSE 
 8           8     84   113 Low             FALSE 
 9           9     88    88 Average         TRUE  
10          10     89    72 Average         FALSE 
# ℹ 90 more rows

DATA %>% 
  mutate(GLEICH = DAP_IQ == WPPSI) %>%
  count(GLEICH) %>%
  gt()

GLEICH	n
FALSE	98
TRUE	2

R Session Info

library(devtools)
session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31 ucrt)
 os       Windows 11 x64 (build 26100)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  German_Germany.utf8
 ctype    German_Germany.utf8
 tz       Europe/Berlin
 date     2025-11-13
 pandoc   3.6.3 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
 quarto   NA @ C:\\PROGRA~1\\RStudio\\RESOUR~1\\app\\bin\\quarto\\bin\\quarto.exe

─ Packages ───────────────────────────────────────────────────────────────────
 package        * version date (UTC) lib source
 abind            1.4-8   2024-09-12 [1] CRAN (R 4.5.0)
 backports        1.5.0   2024-05-23 [1] CRAN (R 4.5.0)
 bayestestR       0.16.0  2025-05-20 [1] CRAN (R 4.5.0)
 broom            1.0.8   2025-03-28 [1] CRAN (R 4.5.0)
 cachem           1.1.0   2024-05-16 [1] CRAN (R 4.5.0)
 car              3.1-3   2024-09-27 [1] CRAN (R 4.5.0)
 carData          3.0-5   2022-01-06 [1] CRAN (R 4.5.0)
 cellranger       1.1.0   2016-07-27 [1] CRAN (R 4.5.0)
 cli              3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
 correlation    * 0.8.7   2025-03-03 [1] CRAN (R 4.5.0)
 curl             6.2.2   2025-03-24 [1] CRAN (R 4.5.0)
 datawizard       1.1.0   2025-05-09 [1] CRAN (R 4.5.0)
 devtools       * 2.4.5   2022-10-11 [1] CRAN (R 4.5.0)
 digest           0.6.37  2024-08-19 [1] CRAN (R 4.5.0)
 distributional   0.5.0   2024-09-17 [1] CRAN (R 4.5.2)
 dplyr          * 1.1.4   2023-11-17 [1] CRAN (R 4.5.0)
 ellipsis         0.3.2   2021-04-29 [1] CRAN (R 4.5.0)
 evaluate         1.0.3   2025-01-10 [1] CRAN (R 4.5.0)
 farver           2.1.2   2024-05-13 [1] CRAN (R 4.5.0)
 fastmap          1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
 forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.5.0)
 Formula          1.2-5   2023-02-24 [1] CRAN (R 4.5.0)
 fs               1.6.6   2025-04-12 [1] CRAN (R 4.5.0)
 generics         0.1.4   2025-05-09 [1] CRAN (R 4.5.0)
 ggdist         * 3.3.3   2025-04-23 [1] CRAN (R 4.5.2)
 ggeffects        2.2.1   2025-03-11 [1] CRAN (R 4.5.0)
 ggExtra        * 0.10.1  2023-08-21 [1] CRAN (R 4.5.0)
 ggplot2        * 3.5.2   2025-04-09 [1] CRAN (R 4.5.0)
 ggpubr         * 0.6.0   2023-02-10 [1] CRAN (R 4.5.0)
 ggsignif         0.6.4   2022-10-13 [1] CRAN (R 4.5.0)
 glue             1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
 gt             * 1.0.0   2025-04-05 [1] CRAN (R 4.5.0)
 gtable           0.3.6   2024-10-25 [1] CRAN (R 4.5.0)
 haven          * 2.5.4   2023-11-30 [1] CRAN (R 4.5.0)
 hms              1.1.3   2023-03-21 [1] CRAN (R 4.5.0)
 htmltools        0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets      1.6.4   2023-12-06 [1] CRAN (R 4.5.0)
 httpuv           1.6.16  2025-04-16 [1] CRAN (R 4.5.0)
 insight          1.3.0   2025-05-20 [1] CRAN (R 4.5.0)
 jsonlite         2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
 knitr            1.50    2025-03-16 [1] CRAN (R 4.5.0)
 labeling         0.4.3   2023-08-29 [1] CRAN (R 4.5.0)
 later            1.4.2   2025-04-08 [1] CRAN (R 4.5.0)
 lifecycle        1.0.4   2023-11-07 [1] CRAN (R 4.5.0)
 lubridate      * 1.9.4   2024-12-08 [1] CRAN (R 4.5.0)
 magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.5.0)
 memoise          2.0.1   2021-11-26 [1] CRAN (R 4.5.0)
 mime             0.13    2025-03-17 [1] CRAN (R 4.5.0)
 miniUI           0.1.2   2025-04-17 [1] CRAN (R 4.5.0)
 parameters       0.26.0  2025-05-22 [1] CRAN (R 4.5.0)
 patchwork      * 1.3.0   2024-09-16 [1] CRAN (R 4.5.0)
 performance      0.14.0  2025-05-22 [1] CRAN (R 4.5.0)
 pillar           1.10.2  2025-04-05 [1] CRAN (R 4.5.0)
 pkgbuild         1.4.7   2025-03-24 [1] CRAN (R 4.5.0)
 pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.5.0)
 pkgload          1.4.0   2024-06-28 [1] CRAN (R 4.5.0)
 profvis          0.4.0   2024-09-20 [1] CRAN (R 4.5.0)
 promises         1.3.2   2024-11-28 [1] CRAN (R 4.5.0)
 purrr          * 1.0.4   2025-02-05 [1] CRAN (R 4.5.0)
 quadprog         1.5-8   2019-11-20 [1] CRAN (R 4.5.0)
 R6               2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer     1.1-3   2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp             1.0.14  2025-01-12 [1] CRAN (R 4.5.0)
 readr          * 2.1.5   2024-01-10 [1] CRAN (R 4.5.0)
 readxl         * 1.4.5   2025-03-07 [1] CRAN (R 4.5.0)
 remotes          2.5.0   2024-03-17 [1] CRAN (R 4.5.0)
 rlang            1.1.6   2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown        2.29    2024-11-04 [1] CRAN (R 4.5.0)
 rstatix          0.7.2   2023-02-01 [1] CRAN (R 4.5.0)
 rstudioapi       0.17.1  2024-10-22 [1] CRAN (R 4.5.0)
 S7               0.2.0   2024-11-07 [1] CRAN (R 4.5.1)
 sass             0.4.10  2025-04-11 [1] CRAN (R 4.5.0)
 scales           1.4.0   2025-04-24 [1] CRAN (R 4.5.0)
 see            * 0.11.0  2025-03-11 [1] CRAN (R 4.5.0)
 sessioninfo      1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
 shiny            1.10.0  2024-12-14 [1] CRAN (R 4.5.0)
 sjlabelled       1.2.0   2022-04-10 [1] CRAN (R 4.5.0)
 sjmisc           2.8.10  2024-05-13 [1] CRAN (R 4.5.0)
 sjPlot         * 2.8.17  2024-11-29 [1] CRAN (R 4.5.0)
 sjstats          0.19.0  2024-05-14 [1] CRAN (R 4.5.0)
 stringi          1.8.7   2025-03-27 [1] CRAN (R 4.5.0)
 stringr        * 1.5.1   2023-11-14 [1] CRAN (R 4.5.0)
 tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.5.0)
 tidyr          * 1.3.1   2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect       1.2.1   2024-03-11 [1] CRAN (R 4.5.0)
 tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.5.0)
 timechange       0.3.0   2024-01-18 [1] CRAN (R 4.5.0)
 tzdb             0.5.0   2025-03-15 [1] CRAN (R 4.5.0)
 urlchecker       1.0.1   2021-11-30 [1] CRAN (R 4.5.0)
 usethis        * 3.1.0   2024-11-26 [1] CRAN (R 4.5.0)
 utf8             1.2.5   2025-05-01 [1] CRAN (R 4.5.0)
 vctrs            0.6.5   2023-12-01 [1] CRAN (R 4.5.0)
 withr            3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
 writexl        * 1.5.4   2025-04-15 [1] CRAN (R 4.5.0)
 xfun             0.52    2025-04-02 [1] CRAN (R 4.5.0)
 xml2             1.3.8   2025-03-14 [1] CRAN (R 4.5.0)
 xtable           1.8-4   2019-04-21 [1] CRAN (R 4.5.0)
 yaml             2.3.10  2024-07-26 [1] CRAN (R 4.5.0)

 [1] C:/Users/Graduiertenschule/AppData/Local/R/win-library/4.5
 [2] C:/Program Files/R/R-4.5.2/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────

Wiederverwendung

CC BY 4.0