This R-package contains functions that I use for the cleaning and transforming of "dirty" data.
Please install package remotes. After successful installation, type remotes::install_github("joheli/kungfu"). Alternatively, download the most recent tagged compressed package from tags and install from the R command line by typing install.packages("vX.X.X.tar.gz"", repos = NULL, type ="source") (where vX.X.X is to be replaced by the most recent version).
Presently, the package contains the following functions (in alphabetic order):
cleaner: removes duplicates in adata.framedf_pattern_subset: subsets adata.framegiven two regex patterns marking the upper left and lower right corners of the returneddata.frame.dfilter: filters a vector of type numeric, integer, Date, or POSIXt, in a fashion that removes entries exceeding a user-specified distance from other values; e.g. "dfiltering"c(1,2,10)would remove10, if argumentmax.distis5(as10 - 2 > 5); similarly, an argumentmin.distcan be specified to enforce a minimal distance between entries.dlabel: wrapsdfilterto label occurrences satisfying specified distance criteria (see functiondfilter) in a data frame; functionbc_contaminationis merely a customized call ofdlabel, designed to scan blood culture results for possible contamination.pattern_join: joins two tables based on regex patterns; it is similar to functionregex_joinin package fuzzyjoin, which I discovered only after writingpattern_joinpostgresql_uploader: uploads adata.frameinto an existing PostgreSQL tablerbinder: function for importing and joining of multiple csv-like files with identical headersseamless: converts a table of intervals into a "seamless" succession of intervalssimilarity_join: joins two tables based on string similarity to a reference (e.g. dictionary of words)
Please check out packages fuzzyjoin and janitor.
Please use help(*function*) or ?*function* to access the help pages of above functions after installation.