R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)

Fri, 13 Aug 2021 00:00:00 +0000

Background

Handling Hive/CSV interaction is a common reality of many analytical and data environments. The question on exporting data from Hive to CSV and other formats is frequently raised on online forums with answers frequently suggestring making use of sed that combined with nifty regular expressions pipes Hive output into a flat CSV files as an exporting solution. Import of large amounts of data is best handled by suitable tools like Apache Flume. That is fine for simpler tables but may prove problematic for tables with a large amount of unstructured text. Frequently analysts and data scientists are faced with a challenge with storing data Hive on a irregular semi-regular basis. For instance, a job may produce new forecastring scenarios that we may want to make available through a Hive tables.

Hive on The Final Artefact

R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)

Background