How to Datapasta

Datapasta provides RStudio addins and functions that give you complete freedom copy-paste data to and from your source editor, formatted for immediate use. Note: repeated use has been known to cause titilation and giddiness.

Places I’ve found this power useful:

  • Copying tables from Excel, Jupyter, and websites, where the source file cannot be easily read.
  • Embedding small-ish amounts of raw data from .csv into Rmarkdown files. The file thus contains code documentation and data, attaining the holy trinity of reproducibility.
  • Quickly pasting vector output from other queries into dplyr::filter( .. %in% ..).
  • Adding datasets to readily reproducible examples for posting to StackOverflow, Slack channels etc.
  • Creating c() expressions with a LOT less typing and fiddling.

Typical usage takes full advantage of addins within RStudio, however datapasta can be used with any R editor, even just the terminal. The typical RStudio case is described in full detail below, followed by the fallback behaviour.

Typical Usage with Rstudio

Pasting a table as a formatted tibble definition with tribble_paste()

You can copy this html table of Brisbane weather forecasts:

X Location Min Max
Partly cloudy. Brisbane 19 29
Partly cloudy. Brisbane Airport 18 27
Possible shower. Beaudesert 15 30
Partly cloudy. Chermside 17 29
Shower or two. Possible storm. Gatton 15 32
Possible shower. Ipswich 15 30
Partly cloudy. Logan Central 18 29
Mostly sunny. Manly 20 26
Partly cloudy. Mount Gravatt 17 28
Possible shower. Oxley 17 30
Partly cloudy. Redcliffe 19 27

And make this appear at the current cursor:

tibble::tribble(
                                  ~X,          ~Location, ~Min, ~Max,
                    "Partly cloudy.",         "Brisbane",   19L,   29L,
                    "Partly cloudy.", "Brisbane Airport",   18L,   27L,
                  "Possible shower.",       "Beaudesert",   15L,   30L,
                    "Partly cloudy.",        "Chermside",   17L,   29L,
    "Shower or two. Possible storm.",           "Gatton",   15L,   32L,
                  "Possible shower.",          "Ipswich",   15L,   30L,
                    "Partly cloudy.",    "Logan Central",   18L,   29L,
                     "Mostly sunny.",            "Manly",   20L,   26L,
                    "Partly cloudy.",    "Mount Gravatt",   17L,   28L,
                  "Possible shower.",            "Oxley",   17L,   30L,
                    "Partly cloudy.",        "Redcliffe",   19L,   27L
    )

tibble::tribble() or ‘transposed tibble’ is a really neat function that allows a tibble to be written in human readable format (Thanks be to Hadley).

To paste data as a tribble() call, just copy the table header and data rows, then paste into the source editor using the addin Paste as tribble. For best results, assign the addin to a memorable keyboard shortcut, e.g. ctrl + shift + t. See Customizing Keyboard Shortcuts.

tribble_paste() is a flexible function that guesses the separator and types of the data it pulls from the clipboard. Mostly this seems to work well. Occasionally it epic-fails. The supported separators are \| (pipe), \t (tab), , (comma), ;(semicolon). Most data copied from the internet or spreadsheets will be tab delimited. It will also attempt to recognise a lack of a header row and create a default for you, although this is not always possible.

Pasting a list as a horizontal vector with vector_paste()

A list could be a row or column of a spreadsheet or intermediate output. With the Paste as vector addin you can go from something like:

Mint    Fedora  Debian  Ubuntu  OpenSUSE

or

Mint, Fedora, Debian, Ubuntu, OpenSUSE

or

Mint
Fedora
Debian
Ubuntu
OpenSUSE

to

c("Mint", "Fedora", "Debian", "Ubuntu", "OpenSUSE")

This is pasted into the source editor at the current cursor.

Just like tribble_paste(), vector_paste() has a flexible parser that can guess the type and separator of the data. The supported separators are \| (pipe), \t (tab), , (comma), ;(semicolon) and end of line. The recommended keyboard shortcut is crtl + alt + shift + v.

Pasting a list as a vertical vector with vector_paste_vertical()

Given the same types of list inputs as above, the Paste as vector (vertical) addin pastes the output with each element on its own line, e.g.:

c("Mint",
  "Fedora",
  "Debian",
  "Ubuntu",
  "OpenSUSE")

This is much nicer for long lists. I have found this is actually the version I use more often. I recommend using ctrl + shift + v as keyboard shortcut.

##Pasting as a data.frame with df_paste() The parser here is identical to tribble_paste() and has all the same type and separator guessing goodness. The difference is the output will be a formatted call to base::data.frame(). Some sensible line wrapping rules etc are implemented. Useful for purists and educators alike. Special thanks to Jonathan Carroll for contributing this feature.

So the Brisbane weather table from above becomes:

data.frame(
           X = c("Partly cloudy.", "Partly cloudy.", "Possible shower.",
                 "Partly cloudy.", "Shower or two. Possible storm.",
                 "Possible shower.", "Partly cloudy.", "Mostly sunny.", "Partly cloudy.",
                 "Possible shower.", "Partly cloudy."),
    Location = c("Brisbane", "Brisbane Airport", "Beaudesert", "Chermside",
                 "Gatton", "Ipswich", "Logan Central", "Manly",
                 "Mount Gravatt", "Oxley", "Redcliffe"),
         Min = c(19, 18, 15, 17, 15, 15, 18, 20, 17, 17, 19),
         Max = c(29, 27, 30, 29, 32, 30, 29, 26, 28, 30, 27)
)

For a shortcut you could try ctrl + shift + d.

Outputting data from your R environment

Output to R with dpasta()

All of the above addin functions can be called directly with an R object argument. When run, this will result in the object being output at the current cursor. Usually the next line. To make things more magical, a there is a single function dpasta that will match the argument with the appropriate _paste() function based on its class. This means:

iris %>%
  head() %>%
  dpasta()

results in:

data.frame(
      Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4),
       Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9),
      Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7),
       Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.4),
           Species = as.factor(c("setosa", "setosa", "setosa", "setosa", "setosa",
                                 "setosa"))
   )

while:

mpg %>%
  select(-class) %>%  #just to fit neatly on this page
  head() %>%
  dpasta()

will give you:

tibble::tribble(
     ~manufacturer, ~model, ~displ, ~year, ~cyl,        ~trans, ~drv, ~cty, ~hwy,  ~fl,
            "audi",   "a4",    1.8, 1999L,   4L,    "auto(l5)",  "f",  18L,  29L,  "p", 
            "audi",   "a4",    1.8, 1999L,   4L,  "manual(m5)",  "f",  21L,  29L,  "p",
            "audi",   "a4",      2, 2008L,   4L,  "manual(m6)",  "f",  20L,  31L,  "p",
            "audi",   "a4",      2, 2008L,   4L,    "auto(av)",  "f",  21L,  30L,  "p",
            "audi",   "a4",    2.8, 1999L,   6L,    "auto(l5)",  "f",  16L,  26L,  "p",
            "audi",   "a4",    2.8, 1999L,   6L,  "manual(m5)",  "f",  18L,  26L,  "p"
     )

Avoiding fiddly data formatting

There are two addins that operate on RStudio cursor selections to make your life easier:

Fiddle Selections until they’re better

Fiddle Selection is intended to remove some fiddly tasks from your workflow. It can turn raw data like 1 2 3 into c(1,2,3), then pivot from that to:

c(1,
  2,
  3)

and back again to c(1,2,3). The parser here is really flexible too. It will accept data delimited by any combination of spaces, commas, and newlines.

Fiddle Selection Can also reflow messy tribble() and data.frame() expressions into neatly aligned ones, say after hand editing.

Toggle Quotes

Toggle Vector Quotes will convert a selected expression like c(a,b,c) to a quoted version i.e c("a","b","c"). If it’s already quoted it will convert the other way to a bare version. All elements will be quoted if there’s a mixture. It also works with vertically aligned expressions.

With the combination of these two you can get really lazy e.g. go from:

some stuff I typed

#To

c("some",
  "stuff",
  "I",
  "typed") # mostly

in a couple of keystrokes!

Try assigning these addins to ctrl + shift + f and ctrl + shift + q respectively.

Output to clipboard with dmdclip()

dmdclip() can help you take the data to somewhere that uses markdown format, for example a Stack Overflow question or Github issue. This function will copy the resulting formatted data object call to the clipboard, inserting 4 spaces at the head of each line, which is markdown syntax for a pre-formatted block.

So:

iris %>%
  head() %>%
  dmdclip()

Will paste the following on the clipboard:

    data.frame(
       Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4),
        Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9),
       Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7),
        Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.4),
            Species = as.factor(c("setosa", "setosa", "setosa", "setosa", "setosa",
                                  "setosa"))
    )

Usage without RStudio

The rstudioapi package enables the calling of addins and output to the cursor. If the API is not detected, all the _paste() functions, and dpasta will output their text to the console, ready for copying and pasting to an editor window.

In this scenario you may wish to avoid installation of the rstudioapi package dependency. Use install.packages("datapasta", dependencies = "Depends") to avoid API installation, but be sure to follow up with install.packages(c("readr","clipr")).

note: The dpasta() function can be used without clipr installed, but you’re missing out on a fair amount of awesomeness if you limit yourself to that.

Custom Behaviour for Your Unique Snowflake Setup

Custom behaviour can be created by taking advantage of the _construct() variants of the _paste() functions, as these return their output as an R object which can then be written to an appropriate buffer or clipboard.

for example, if you copied the Brisbane weather forecast from above to the clipboard and then called:

trib_call <- tribble_construct()

trib_call now contains a the tribble call as a character vector. You could then write this with:

write(trib_call, file = ..your desired location..)
#OR
clipr::write_clip(trib_call) #Send it back to the clipboard.

Configurable Options

Upping the row guard

For your protection, datapasta will initially refuse to output R objects of 200 or more rows. Up the row limit for your specific scenario with dp_set_max_rows(n). Large numbers of rows could take a long time to format. In extreme cases you could crash your R/RStudio session.

Dealing with “,” decimal marks

Use dp_set_decimal_mark(",") to handle numbers like 3,14.