Skip to content

markbaas/charlatan

 
 

Repository files navigation

charlatan

Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-check cran checks codecov cran version rstudio mirror downloads

charlatan makes fake data, inspired from and borrowing some code from Python's faker (https://github.com/joke2k/faker)

Make fake data for:

  • person names
  • jobs
  • phone numbers
  • colors: names, hex, rgb
  • credit cards
  • DOIs
  • numbers in range and from distributions
  • gene sequences
  • geographic coordinates
  • emails
  • URIs, URLs, and their parts
  • IP addresses
  • more coming ...

Possible use cases for charlatan:

  • Students in a classroom setting learning any task that needs a dataset.
  • People doing simulations/modeling that need some fake data
  • Generate fake dataset of users for a database before actual users exist
  • Complete missing spots in a dataset
  • Generate fake data to replace sensitive real data with before public release
  • Create a random set of colors for visualization
  • Generate random coordinates for a map
  • Get a set of randomly generated DOIs (Digital Object Identifiers) to assign to fake scholarly artifacts
  • Generate fake taxonomic names for a biological dataset
  • Get a set of fake sequences to use to test code/software that uses sequence data

Reasons to use charlatan:

  • Lite weight, few dependencies
  • Relatively comprehensive types of data, and more being added
  • Comprehensive set of languages supported, more being added
  • Useful R features such as creating entire fake data.frame's

Installation

cran version

install.packages("charlatan")

dev version

remotes::install_github("ropensci/charlatan")
library("charlatan")

high level function

... for all fake data operations

x <- fraudster()
x$job()
#> [1] "Toxicologist"
x$name()
#> [1] "Bart Franecki"
x$color_name()
#> [1] "IndianRed"

locale support

Adding more locales through time, e.g.,

Locale support for job data

ch_job(locale = "en_US", n = 3)
#> [1] "Ranger/warden"       "Psychotherapist"     "Immigration officer"
ch_job(locale = "fr_FR", n = 3)
#> [1] "Géotechnicien"                               
#> [2] "Professeur documentaliste"                   
#> [3] "Ingénieur efficacité énergétique du bâtiment"
ch_job(locale = "hr_HR", n = 3)
#> [1] "Policajac"                           "Voditelj projekta"                  
#> [3] "Zdravstveno laboratorijski tehničar"
ch_job(locale = "uk_UA", n = 3)
#> [1] "Фотограф" "Зоолог"   "Мірошник"
ch_job(locale = "zh_TW", n = 3)
#> [1] "CNC電腦程式編排人員" "特用化學工程師"      "財務或會計主管"

For colors:

ch_color_name(locale = "en_US", n = 3)
#> [1] "DarkSlateGray" "Indigo"        "NavajoWhite"
ch_color_name(locale = "uk_UA", n = 3)
#> [1] "Червоно-буро-помаранчевий" "Темно-лососевий"          
#> [3] "Блідо-брунатний"

More coming soon ...

generate a dataset

ch_generate()
#> # A tibble: 10 x 3
#>    name                     job                        phone_number      
#>    <chr>                    <chr>                      <chr>             
#>  1 Mr. Posey Stehr III      Immigration officer        +61(2)7879379341  
#>  2 Ms. Henriette Wiegand    Catering manager           1-580-580-8638x830
#>  3 Irena Russel             Retail banker              +04(7)9699546042  
#>  4 Dr. Daniel Bechtelar DDS Architectural technologist 1-834-397-4529x863
#>  5 Dr. Kasey Davis          Designer, jewellery        351.022.9534x24105
#>  6 London Hansen-Hackett    Graphic designer           +06(5)1147537086  
#>  7 Lilyana Runte            Counsellor                 01692508550       
#>  8 Shaquana Herzog          Theme park manager         667.617.8036x99553
#>  9 Maybell Raynor-Hartmann  Writer                     (616)978-2091     
#> 10 Averie Murphy            Community pharmacist       1-111-441-1704
ch_generate('job', 'phone_number', n = 30)
#> # A tibble: 30 x 2
#>    job                                         phone_number      
#>    <chr>                                       <chr>             
#>  1 Armed forces training and education officer 1-673-556-2393x997
#>  2 Soil scientist                              1-296-630-3970    
#>  3 Optician, dispensing                        1-678-990-8871    
#>  4 Learning disability nurse                   461.171.6544      
#>  5 Editor, commissioning                       05011328685       
#>  6 Designer, exhibition/display                +26(6)2762788230  
#>  7 Financial risk analyst                      1-636-012-0957x508
#>  8 Scientist, biomedical                       719.524.4489      
#>  9 Teacher, English as a foreign language      +54(0)1232453568  
#> 10 Lecturer, higher education                  (853)580-9291x3186
#> # … with 20 more rows

person name

ch_name()
#> [1] "Kara Boehm"
ch_name(10)
#>  [1] "Rebecca Monahan"        "Suzann Franecki"        "Debby Nikolaus"        
#>  [4] "Ama Ullrich"            "Arba Volkman"           "Antony Mueller"        
#>  [7] "Ms. Cinnamon Anderson"  "Iver Hermann"           "Shirleen Mills-Schmidt"
#> [10] "Hadley Little"

phone number

ch_phone_number()
#> [1] "+36(0)2342842531"
ch_phone_number(10)
#>  [1] "08296463291"        "970.366.6818"       "01055866557"       
#>  [4] "01717878683"        "785-103-9978"       "1-079-787-2377x619"
#>  [7] "323.362.8212"       "1-303-274-5722"     "493.066.7885x8181" 
#> [10] "610.791.1645x3705"

job

ch_job()
#> [1] "Therapeutic radiographer"
ch_job(10)
#>  [1] "Environmental manager"               "Designer, blown glass/stained glass"
#>  [3] "Conservator, furniture"              "Copy"                               
#>  [5] "Administrator, local government"     "Investment analyst"                 
#>  [7] "Public librarian"                    "Engineer, materials"                
#>  [9] "Mechanical engineer"                 "Forest/woodland manager"

credit cards

ch_credit_card_provider()
#> [1] "VISA 16 digit"
ch_credit_card_provider(n = 4)
#> [1] "VISA 16 digit"    "JCB 15 digit"     "JCB 15 digit"     "American Express"
ch_credit_card_number()
#> [1] "561223593016571"
ch_credit_card_number(n = 10)
#>  [1] "54998053024724596"   "869968125239286630"  "210063772612064392" 
#>  [4] "4060155369087233"    "501898051709842"     "3712676203745602"   
#>  [7] "3461064670166497"    "3096517555374787348" "3158434698000233509"
#> [10] "3037311974396594"
ch_credit_card_security_code()
#> [1] "811"
ch_credit_card_security_code(10)
#>  [1] "598"  "164"  "0297" "083"  "741"  "519"  "948"  "452"  "6641" "286"

Usage in the wild

Contributors

similar art

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for charlatan in R doing citation(package = 'charlatan')
  • Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Packages

No packages published

Languages

  • R 99.9%
  • Makefile 0.1%