Missing data is a common problem in data analysis. Sometimes you can delete missing data. One approach is to remove strings with missing values. In this article, we examine examples of deleting lines with missing values using dplyr in R.
How do I delete lines with missing values?
We will use the dplyr drop_na() function to remove rows of missing data. Let’s start loading it gently.
Library
As in the other examples in tidyverse 101, we will use the fantastic penguin dataset to illustrate three ways to visualize data in a data frame. Download github data from cmdlinetips.com
path2data <- https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv
penguins<- readr::read_csv(path2data)
Let’s use the dplyr’s relocate() function to mark the gender column, which has some missing values.
# move genus column to first
penguins <- Penguins %>%
move (genus)
We see that our data frame has a total of 344 lines, with some lines having missing values. Note that the fourth row does not contain values for most columns and is displayed as NA.
Penguins
## # A tibble: 344 x 7
## Gender Type Island Beak_Length_mm Beak_Depth_mm Fin_Length_mm body_mass_g
##
## 1 man Adelie Torge… # 39,1 18,7 181 3750
## 2 fema… Adelie Torge… 39.5 17.4 186 3800
## 3 fema… Adeli Torj… 40.3 18 195 3250
## 4 Adeli Torj… NA
## 5 fema… Adeli Torj… 36.7 19.3 193 3450
## 6 man Adelie Torge… 39,3 20,6 190 3650
Let’s use the dplyr’s drop_na() function to remove strings that contain at least one missing value.
Penguins %>%
drop_na()
Our resulting data frame now contains 333 rows after the rows with missing values have been removed. Note that the fourth line of our original data frame had missing values and has now been removed.
## # A tibble: 333 x 7
## Type island bill_length_mm bill_depth_mm flipper_length … body_mass_g
##
## 1 Adelie Torge … 39.1 18.7 181 3750
## 2 Adelie Torge… 39.5 17.4 186 3800
## 3 Adeli Torj… 40.3 18 195 3250
## 4 Adeli Torj… 36.7 19.3 193 3450
##5 Adelie Torge… 39.3 20.6 190 3650
## 6 Adeli Torj… 38,9 17,8 181 3625
How do I delete rows based on missing values in a column?
Sometimes it may be necessary to remove rows due to missing values in one or more columns of the data frame. Delete rows based on missing values in a column.
Penguins %>%
drop_na(bill_length_mm)
We deleted the rows based on the missing values in the column Bill_length_mm. Compared to the above example, the resulting data framework contains the missing values in the other columns. In this example we see the missing values Note that
## A tibble: 342 x 7
## Gender Type Island Beak_Length_mm Beak_Depth_mm Finnish_Length_mm
##
## A man Adelie Torge… 39,1 18,7 181
## 2 fema… Adeli Torj… 39.5 17.4 186
## 3 fema… Adeli Torj… 40.3 18 195
## 4 fema… Adeli Torj… 36.7 19.3 193
## 5 man Adelie Torge… 39.3 20.6 190
## 6 fema… Adeli Torj… 38.9 17.8 181
## 7 man Adelie Torge… 39.2 19.6 195
## 8 Adelie Torge… 34,1 18,1 193
## 9 Adelie Torge 42 20,2 190
## 10 Adelie Torge 37,8 17,1 186
## … 332 extra lines, and 1 extra variable: body_mass_g.
Related Tags:
how to remove unwanted data in r,complete.cases in r,0 r