SHA1 with BASE64 Hadoop Hive UDF


This is a simple UDF for applying SHA1 + BASE64 on a string in Hive. Works like a charm in Hadoop Hive (tested with CDH 4.2.1)


package io.jackass.hadoop.hive.udf.crypto;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.security.*;
import org.apache.commons.codec.binary.Base64;

public final class sha1 extends UDF {

	public Text evaluate(final Text s) {
	    if (s == null) {
                return null;
	    }
	    try {
	    	MessageDigest md = MessageDigest.getInstance("SHA1");
	    	md.update(s.toString().getBytes());
	    	byte[] hash = md.digest();
              Base64 encoder = new Base64();

		return new Text(encoder.encodeToString(hash));
	    } catch (NoSuchAlgorithmException nsae) {
	    	throw new IllegalArgumentException("SHA1 is not setup");
	    }
	}
}

It’s really simple to use it in Hive


    ADD JAR hive-crypto-udfs-1.0.jar;
    CREATE TEMPORARY FUNCTION sha1 as 'io.jackass.hadoop.hive.udf.crypto.sha1';
    select sha1('1111') from your_table;

if you need some help building JAR file, here is old school javac (tested with CDH 4.2.1)

  • place code above to subdirectory io/jackass/hadoop/hive/udf/crypto/sha1.java
  • run code below
  •     CP=$(find "/opt/cloudera/parcels/CDH/lib" -name '*.jar' -printf '%p:' | sed 's/:$//')
        javac -classpath $CP io/jackass/hadoop/hive/udf/crypto/sha1.java
        jar -cf hive-crypto-udfs-1.0.jar  -C . .
    
    Advertisements

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

    %d bloggers like this: