<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Hive on The Final Artefact</title><link>https://www.thefinalartefact.xyz/tags/hive/</link><description>Recent content in Hive on The Final Artefact</description><generator>Hugo</generator><language>en-gb</language><lastBuildDate>Fri, 13 Aug 2021 00:00:00 +0000</lastBuildDate><atom:link href="https://www.thefinalartefact.xyz/tags/hive/index.xml" rel="self" type="application/rss+xml"/><item><title>R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)</title><link>https://www.thefinalartefact.xyz/post/importing-csv-to-hive/</link><pubDate>Fri, 13 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.thefinalartefact.xyz/post/importing-csv-to-hive/</guid><description>&lt;h2 id="background"&gt;Background&lt;/h2&gt;
&lt;p&gt;Handling Hive/CSV interaction is a common reality of many analytical and data environments. The question on exporting data from Hive to CSV and other formats is frequently raised on online forums with answers frequently suggestring making use of &lt;a href="https://en.wikipedia.org/wiki/Sed"&gt;&lt;code&gt;sed&lt;/code&gt;&lt;/a&gt; that combined with nifty regular expressions pipes Hive output into a flat CSV files as an exporting solution. Import of large amounts of data is best handled by suitable tools like &lt;a href="https://flume.apache.org"&gt;Apache Flume&lt;/a&gt;. That is fine for simpler tables but may prove problematic for tables with a large amount of unstructured text. Frequently analysts and data scientists are faced with a challenge with storing data Hive on a irregular semi-regular basis. For instance, a job may produce new forecastring scenarios that we may want to make available through a Hive tables.&lt;/p&gt;</description></item></channel></rss>