R on The Final Artefact

Using RScript for R Installation Management

Mon, 03 Jan 2022 00:00:00 +0000

Most frequently, users tend to undertake common R installation and management tasks from within the R session. Frequently making use of commands, like install.packages, update.packages or old.packages to obtain or update packages or update/verify the existing packages. Those common tasks can also be accomplished via the GUI offered within RStudio, which provides an effortless mechanism for undertaking basic package management tasks. This is approach is usually sufficient for the vast majority of cases; however, there are some examples when working within REPL^[REPL stands for Read Eval Print Loop and is usually delivered in a form of an interactive shell. While working in Python users would commonly access REPLY by running python or ipython, more details.] to accomplish common installation tasks is not hugely convenient.

Beauty of R and Big-O

Thu, 09 Dec 2021 00:00:00 +0000

Big-O

The purpose of this is not to provide yet another primer on the Big-O/$\Omega$/$\Theta$ notation but to share my enduring appreciation for working with R. I will introduce Big-O only briefly to provide context but I would refer all of those who are interested to the linked materials.

What is Big-sth notation

When analysing functions, we may be interested in knowing how fast a function grows. For instance, for function $T(n)=4n^2-2n+2$, after ignoring constants, we would say that $T(n)$ grows at the order of $n^2$. With respect to the Big-O notation we would write $T(n)=O(n^2)$^[MIT. (2021, December 9). Big O notation. Introduction to Computers and Programming. Retrieved December 26, 2021, from https://web.mit.edu/16.070/www/lecture/big_o.pdf]. Most commonly, in computer science, we would differentiate between Big O, Big Theta $(\Theta)$ and Big Omega $(\Omega)$. In a nutshell, the differences between those common notations can be summarised as follows:

R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)

Fri, 13 Aug 2021 00:00:00 +0000

Background

Handling Hive/CSV interaction is a common reality of many analytical and data environments. The question on exporting data from Hive to CSV and other formats is frequently raised on online forums with answers frequently suggestring making use of sed that combined with nifty regular expressions pipes Hive output into a flat CSV files as an exporting solution. Import of large amounts of data is best handled by suitable tools like Apache Flume. That is fine for simpler tables but may prove problematic for tables with a large amount of unstructured text. Frequently analysts and data scientists are faced with a challenge with storing data Hive on a irregular semi-regular basis. For instance, a job may produce new forecastring scenarios that we may want to make available through a Hive tables.

Why regex is not fuzzy matching

Tue, 29 Jun 2021 00:00:00 +0000

Recently, I cam across an interesting discussion on StackOverflow^[SO discussion on: Fuzzy Join with Partial String Match in R] pertaining to approach to fuzzy matching tables in R. Good answer contributed by one of the most resilient and excellent contributors to whom I owe a lot of thanks for help suggested relying on regular expression, combining this with basic string removal and transformations like toupper to deterministically match the tables. The solution solved the problem and was accepted.

Using R for File Manipulation

Mon, 29 Mar 2021 00:00:00 +0000

Challenge

File manipulation is a frequent task unavoidable in almost every IT business process. Traditionally, file manipulation tasks are accomplished within the ramifications of specific tools native to a given system. As such, the one may consider writing and scheduling shell scriptt to undertake frequent file operations or using more specific purpose-built tools like logrotate in order to archive logs or tools like Kafka are used to build streaming-data pipelines. R is usually though of as a statistical programming language or as an environment for a statistical analysis. The fact that R is a mature programming language able to successfully accomplish a wide array of traditional tasks is frequently ignored. What constitutes a programming language is a valid question. Wikipedia offers somehow wide definition:

Inserting Data into Partitioned Table

Fri, 26 Feb 2021 00:00:00 +0000

Rationale

Maintaining partitioned Hive tables is a frequent practice in a business. Properly structured tables are conducive to achieving robust performance through speeding up query execution (see Costa, Costa, and Santos 2019). Frequent use cases pertain to creating tables with hierarchical partition structure. In context of a data that is refreshed daily, the frequently utilised partition structure reflects years, months and dates.

Creating partitioned table

In HiveQL we would create the table with the following structure using the syntax below. In order to keep the development tidy, I’m creating a separate database on Hive which I will use for the purpose of creating tables for this article.

Poor Man's Robust Shiny App Deployment (Part II)

Fri, 12 Feb 2021 00:00:00 +0000

Introduction

This article draws on the past post concerned with utilisation of golem for robust deployment of analytical and reporting solutions. For this article, we will assume that we are working with defined working requirements that utilise some of the Labour Market Statistics disseminated through the nomis portal.

Change Plan

What we have

Reporting requirements
Past scriptts we used to create reports with accompanying instructions

What we want

Stronger business continuity - we want to be able to give some access to this project and don’t be concerned with missing files, outdated unavailable documentation and questions on how to produce updated reports. We want self-encompassing entity that takes of care of its technical requirements and user-interaction^[Good parallel can be drawn between this approach and manuals available with life-saving equipment. Equipment delivers technical capacity and manual ensures operational capacity. In case of an inexperienced user one is not useful without the other. We want to ensure that user with minimum required capacity can use the tools correctly.]
Better reproducibility - Easier way to re-run reports on custom parameters
Improved efficiency - We want to have a possibility of quickly creating updated and re-running past reports using the app.
Better development:
- We want to ensure that any change requests to our reporting/analytical stack won’t break crucial functionalities.
- We want to modularise development so new corporate branding or visualisation requirements can be applied with no (or minimal) integration in analytical function

Framework

Package

Future robust development owes a lot to solid foundations. As the aim is to capitalise on the robust R package architecture, we will look to leverage available supporting packages. As a first step, we will construct a new Shiny/R package infrastructure using golem.

Poor Man's Robust Shiny App Deployment

Thu, 23 Jul 2020 00:00:00 +0000

Not so uncommon problem

RStudio Connect and more modest Shiny Proxy come to mind as most obvious solutions for deploying Shiny applications in production. Application servers are ideal for deploying applications that are to be consumed on a regular basis by larger audiences. In addition to serving the application, managing dependencies and user access or logging user activity are common tasks we would expect for a publishing platform to address. Frequently, however, deployment of Shiny application is directed at smaller audiences and less frequent usage. In such a situation, are availability, accessibility and user access management requirements will be often more modest. Commonly,in business a modelling or analytical solution can be packaged in Shiny application facilitating periodical re-run of models with different parameters and updated data sets. Such solutions can be conveniently utilised to facilitated development of monthly or quarterly reports. If the app is used once per month/quarter by a narrow user group the need to deploy it on the server is not well articulated. In that particular case we are mostly interested in ensuring that we can:

Three-Way Operator in R

Fri, 08 May 2020 00:00:00 +0000

Is there a merit for a three-way operator in R?

Background

In C++20 revision added “spaceship operator”, which is defined as follows:

1
2
3


(a <=> b) < 0 # if lhs < rhs
(a <=> b) > 0 # if lhs > rhs
(a <=> b) == 0 # if lhs and rhs are equal/equivalent.

R implementation

The behaviour can be achieved in R in multiple ways. A one straightforward approach would involve making use of the ifelse statement

`ifelse` implementation

Basic approach would involve comparing the two figures and respectively returning -1 or 1 consistently with the definition above.

Interactively Loading Shiny Modules

Sat, 24 Nov 2018 00:00:00 +0000

TL;DR

If you want to see the implemented solution, please refer to: GitHub repo.

Context

Shiny is a widely popular web application framework for a R. In simple tearms it enables any R programmer to develop and deploy web application. This application could be simple - an interactive document consisting of a few charts and tables or a c complex “behemoth” with multiple functionalities enabling end-users to run models, query external data, generate exportable reports and sophisticated visuals.

ASCII charts in R

Fri, 05 Jun 2015 00:00:00 +0000

In Stata it is possible to use function plot in order to get a simple scatter plot in Stata console. As of Stata eight, plot is no longer supported but remains a useful tool for quickly exploring relationships between variables. Using plot on the auto data provides the following results:

Now the question is: can we achieve the same level of convenience in R? Of course. The txtplot package authored by Bjoern Bornkamp provides similar functionality. Executing the code below will generate nice text plot straight in the R console:

Managing rows in the ggplot legend

Sat, 28 Mar 2015 00:00:00 +0000

After developing the Shiny App sourcing live labour market data from NOMIS. I wanted to accommodate a convenient way of managing rows in the legend. In particular, I wanted to account for the situation where end-user may select a number of geographies that will only conveniently fit into two or more rows. After transposing the data to long format, guessing the number of elements in the legend is relatively simple as it will correspond to the number of unique geographies passed via the subset command.

Amusing way to get user input windows in R

Wed, 01 Feb 2012 00:00:00 +0000

In an unlikely scenario that beautiful Shiny apps do not meet your analytical requirements and developing a full-blown user interface. in RGtk2 may seem to be a little too much, there is a third, often overlooked solution, - package svDialogs by Philippe Grosjean. The package in a convenient way enables user to create various interface gadgets. For example the code:

1
2
3
4


require(svDialogs)

## Let's keep some data in one place
user_figure <- svDialogs::dlg_input()

would result in the following window being presented to the user:

R on The Final Artefact

Using RScript for R Installation Management

Beauty of R and Big-O

Big-O

What is Big-sth notation

R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)

Background

Why regex is not fuzzy matching

Using R for File Manipulation

Challenge

Inserting Data into Partitioned Table

Rationale

Creating partitioned table

Poor Man's Robust Shiny App Deployment (Part II)

Introduction

Change Plan

What we have

What we want

Framework

Package

Poor Man's Robust Shiny App Deployment

Not so uncommon problem

Three-Way Operator in R

Background

R implementation

ifelse implementation

Interactively Loading Shiny Modules

TL;DR

Context

ASCII charts in R

Managing rows in the ggplot legend

Amusing way to get user input windows in R

`ifelse` implementation