Category Archives: Oracle DML

Oracle loves Hadoop – integrate hadoop seamlessly into Oracle

Ever wanted to execute hadoop scripts from Oracle environment and join the results to any oracle table? It’s easier than you think, there is no special tool required, there is no transient (temporary) file created, there is no ETL !! This can be all executed on a fly by any user anytime. Why would you want to do it? Well there can be many reasons, for example you have your customer data stored in oracle and want to join them to very large call files or web longs generated every day.

.

Before I explain the trick, let me show you sample print-screen what I mean by “seamlessly”. Sample below is a query (in this case executed in DataStudio, but it can be pretty much any client you use to connect to oracle – SQLPLUS, TOAD, SAS, MS ACCESS, … you name it), as you can see I run very simple query which in the background goes to remote server, executes hadoop script and returns result back to oracle. User does not need to know that the data reside on non-oracle server (in this case hadoop). Please note the hadoop script is executed on a fly, this is not pre-executed script (it executes every time you run the query).

.

What you need

I used sample VMWARE Hadoop image, I also got Oracle 11.2 Ent on RedHat (this is really nothing special, just standard Oracle install). That’s all you need. Yup, it’s that simple.

.

Unix setting

No don’t worry, there is really no special setting required. The only thing you should do is to exchange public keys between Hadoop and RedHat servers so your ssh won’t require password. Here is very simple how-to

Then create 4 unix directories, you can create just one if you wish, I try to keep things organized.

/home/jiri/Desktop/sample/data    - this will hold dummy empty file
/home/jiri/Desktop/sample/log     - this will hold oracle logs
/home/jiri/Desktop/sample/bad     - this will hold oracle reject records
/home/jiri/Desktop/sample/execute - this will hold hadoop scripts (wrapped in sh)

I use sample hadoop script for calculating PI, this script is part of VMWARE image. To execute this script I create two unix scripts on my RedHat Server. As you can see one script is basically simple wrapper, the other calls hadoop script over ssh. Take these as simple examples, there are many ways how to improve them and make them production ready.

/home/jiri/Desktop/sample/execute/run_hadoop.sh

#!/bin/bash
/home/oracle/Desktop/sample/execute/hadoop_script.sh 2>/dev/null

/home/jiri/Desktop/sample/execute/hadoop_script.sh – please change the IP address to your hadoop server IP or host.

#!/bin/bash
/usr/bin/ssh guest@192.168.149.132 hadoop jar hadoop-examples.jar pi 4 10000

.

the last step is to create dummy empty file  (it’s really just an empty file, this is dummy file you can reuse for all hadoop scripts)

/home/jiri/Desktop/sample/data/dummy.dat

.

Oracle setting

There is really nothing much to setup. I will use external tables with preprocessor (new 11.2 feature). I wrote about this feature in two older posts, you can find them here and here

First let me setup oracle directories (login to oracle as DBA or user with grants to create directories)

 CREATE DIRECTORY dir_sample_data as '/home/jiri/Desktop/sample/data';
 CREATE DIRECTORY dir_sample_log  as '/home/jiri/Desktop/sample/log';
 CREATE DIRECTORY dir_sample_bad  as '/home/jiri/Desktop/sample/bad';
 CREATE DIRECTORY dir_sample_exec as '/home/jiri/Desktop/sample/execute';

then I need to give grants on these directories to a user I will use to run the hadoop (in my case oracle user JIRI)

 GRANT READ        on DIRECTORY dir_sample_data to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_log  to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_bad  to JIRI;
 GRANT EXECUTE     on DIRECTORY dir_sample_exec to JIRI;

now login as normal oracle user (in my case JIRI) and create external table

CREATE TABLE JIRI.MYHADOOP_EXT
 (
 SCRIPT_OUTPUT   VARCHAR2(2000)
 )
 ORGANIZATION EXTERNAL
 ( TYPE ORACLE_LOADER
   DEFAULT DIRECTORY dir_sample_data
   ACCESS PARAMETERS
   ( RECORDS DELIMITED BY NEWLINE
     PREPROCESSOR dir_sample_exec:'run_hadoop.sh'
     SKIP 0
     LOGFILE dir_sample_log:'dummy.log'
     BADFILE dir_sample_bad:'dummy.bad'
     NODISCARDFILE
     FIELDS TERMINATED BY '~' OPTIONALLY ENCLOSED BY '~'
     MISSING FIELD VALUES ARE NULL
   )
   LOCATION (dir_sample_data:'dummy.dat')
 )
 REJECT LIMIT 0
 NOPARALLEL
 NOMONITORING;

That’s it. Now every time user JIRI runs a query which uses MYHADOOP_EXT table, Oracle will execute unix scripts which executes hadoop script on remote server and returns stdout result back to the oracle user.

As you can see oracle external table preprocessor is very powerful tool, you can use it not only during ETL (as oracle documentation suggests unzip example), but for pretty much any unix level manipulation.

Tagged , , ,

Query CLOB or BLOB from oracle table across DB-LINK with a simple view – Solution 2

I will continue my two-part post with second a bit twisted solution. I did not try to optimize this for speed, it’s a simple proof of concept. This solution requires NO staging tables, but you need tiny bit more grants on remote server (you need to be able to create function, type and a view).

Please read Solution 1 first before you continue with Solution 2 and why you can get error ORA-22992: cannot use LOB locators selected from remote tables

.

SOLUTION 2

This is a bit twisted solution, but works. Basically you first need to convert LOB (in these example I refer to CLOB, BLOB is similar but you will probably not use varchar2) to multiple varchar2(4000) on remote server – for example CLOB with size 8100 will be converted to 3 varchar2(4000), then read this data across db-links, then combine it back to CLOB on a local server. This can be all done on a fly without any staging tables.

.

1. REMOTE SERVER

let’s first define one table on remote server and load it with some data.

create table RemotetTableWithClob (myid number, myclob clob);

insert into RemotetTableWithClob (myid, myclob) values (1, 'This is small test');
insert into RemotetTableWithClob (myid, myclob) values (2, 'Another Test');
insert into RemotetTableWithClob (myid, myclob) values (3, 'One more test');
commit;

then we need to create a function for converting CLOB to multiple varchar2 and hide it behind view. Please note I tried to use collection here, but you cannot query collection across db-link, you will be facing error ORA-22804: remote operations not permitted on object tables or user-defined type columns thus I convert the clob to basically multi-row view (number of records is variable and depends on size of LOB). This is very simple

CREATE TYPE object_row_type AS OBJECT (
 MYORDER            NUMBER,
 MYID               NUMBER,
 MYCLOB_AS_VARCHAR  VARCHAR2(4000)    );

CREATE TYPE object_table_type AS TABLE OF object_row_type;

-- convert CLOB to multi-row varchar2
CREATE OR REPLACE FUNCTION CONVERT_CLOB_TO_VARCHAR RETURN object_table_type PIPELINED IS
PRAGMA AUTONOMOUS_TRANSACTION;

v_clob_length  number;
v_loops        number;
v_varchar_size number := 4000;

BEGIN

 FOR cur IN (SELECT myid, myclob from RemoteTableWithClob)
 LOOP
   v_clob_length := dbms_lob.getlength (cur.MyClob);
   v_loops := trunc(v_clob_length/v_varchar_size) + sign (mod(v_clob_length,v_varchar_size ) ) - 1;

   FOR i IN 0..v_loops
   LOOP
     PIPE ROW(object_row_type(i + 1, cur.myid, dbms_lob.substr( cur.MyClob, v_varchar_size, v_varchar_size * i + 1 )));
   END LOOP;

 END LOOP;
 COMMIT;
 RETURN;
END;

now the view on remote server

-- create view to show the clob converted to varchar
-- please notice there is important column MYORDER (order of individual varchar chunks)
CREATE VIEW myRemoteData as
SELECT * FROM TABLE(CONVERT_CLOB_TO_VARCHAR) a;

gg.

2. LOCAL SERVER

on a local server, I read the records and convert them to collection (this can be done in a query using CAST-MULTISET), then I use a function to convert this collection back to CLOB.

.

CREATE PUBLIC DATABASE LINK RemoteServer  USING 'remoteserver';
CREATE OR REPLACE TYPE myTableType  as table of varchar2(4000);

CREATE OR REPLACE FUNCTION JIRI.F_VARCHAR_TO_CLOB (input_table_of_varchar myTableType) RETURN CLOB IS
PRAGMA AUTONOMOUS_TRANSACTION;
v_clob  clob;
BEGIN
FOR i IN 1..input_table_of_varchar.COUNT
LOOP
 -- the concatenation below could be probably done using DBMS_LOB.WRITEAPPEND :-)
 v_clob := v_clob || input_table_of_varchar(i);
END LOOP;
RETURN v_clob;
END;

and here is the final view on local server

-- the view first converts separate varchar2 records to a collection
-- then function is used to convert the collection to CLOB
CREATE OR REPLACE VIEW myRemoteData as
SELECT a.myid,
       F_VARCHAR_TO_CLOB( CAST(MULTISET(SELECT b.MYCLOB_AS_VARCHAR
                                          FROM myRemoteData@remoteserver b
                                         WHERE a.MYID = b.MYID
                                      ORDER BY MYORDER   ) as myTableType)
                        ) myClob
FROM REMOTETABLEWITHCLOB@remoteserver a;

As you can see the process is very simple. I tested it with CLOB with more than 100,000 characters on my tiny VMWARE instance and got pretty decent timing. Please take the code above as a proof of concept, not as a tuned production ready solution.

Tagged , , , , , , ,

Query CLOB or BLOB from oracle table across DB-LINK with a simple view – Solution 1

This is not so unusual scenario: You have a table with one or more LOB fields on remote oracle server and you want to query that table across db-link. If you try simple SELECT statement you will be facing ORA-22992: cannot use LOB locators selected from remote tables

oooops, what next. There are several workarounds, let me try to tackle one easy solution.

This testing was done using oracle 10g running on remote windows machine and 11.2 running on local redhat5 server, but it should in general work for any 10g/11g servers.
.
for the second solution follow this link
.

.

SOLUTION 1

The first solution is actually very simple. Create a table (I prefer temporary table) on local oracle server, insert data from remote server to local server using INSERT INTO and then query local server. This can be all fully automated and hidden from users …

1. REMOTE SERVER

let’s first define one table on remote server and load it with some data, that’s all we need to do on remote server.

create table RemotetTableWithClob (myid number, myclob clob);

insert into RemotetTableWithClob (myid, myclob) values (1, 'This is small test');
insert into RemotetTableWithClob (myid, myclob) values (2, 'Another Test');
insert into RemotetTableWithClob (myid, myclob) values (3, 'One more test');
commit;

.

2. LOCAL SERVER

first, let me create db-link (I have same userID on both servers, this will be passwordless db-link) and then try to run simple query

CREATE PUBLIC DATABASE LINK RemoteServer  USING 'remoteserver';

select myid, substr(myclob, 1, 5) mytext from remotetablewithclob@RemoteServer;
ORA-22992: cannot use LOB locators selected from remote tables

as you can see I cannot query the CLOB field.

.

3. SIMPLE WORKAROUND

The simply workaround is to bring the data from remote server to local server and then query local table. I use temporary tables simply because I don’t want to worry about deleting the data and concurrent users.

create global temporary table LocalTempTableWithClob
 (myid number,  myclob clob) ON COMMIT PRESERVE ROWS;

insert into  LocalTempTableWithClob Select * from RemoteTableWithClob@RemoteServer;
select  myid, substr(myclob, 1, 5) mytext from LocalTempTableWithClob;

MYID   MYTEXT
--------------
 1  This
 2  Anoth
 3  One m

.

4. NOW THE REAL TRICK

As you can see (3) works pretty good and it can be the end of the post, but … what if you have multiple users who want to query the data. Are you going to ask them to run the insert every time the want to query the table?

Here is my solution to that problem. I basically use a function to insert data from remote to local table and then display the result of local table instead. This is all done in a single pass in a single function and can be hidden from users via view.

I use pipelined function because I really love them and hope more users are going to start using them.

CREATE TYPE object_row_type AS OBJECT (
 MYID               NUMBER,
 MYCLOB             CLOB             );

CREATE TYPE object_table_type AS TABLE OF object_row_type;

CREATE OR REPLACE FUNCTION GETCLOB RETURN object_table_type PIPELINED IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
 insert into LocalTempTableWithClob Select * from RemoteTableWithClob@RemoteServer;
 commit;
 FOR cur IN (SELECT myid, myclob from LocalTempTableWithClob)
 LOOP
   PIPE ROW(object_row_type(cur.myid, cur.myclob));
 END LOOP;
 DELETE FROM LocalTempTableWithClob;
 COMMIT;
 RETURN;
END;

create view myRemoteData as
SELECT * FROM TABLE(GetClob) a;

select myid, substr(myclob, 1, 5) mytext from myRemoteData;

MYID  MYTEXT
--------------
 1  This
 2  Anoth
 3  One m

As you can see the solution is very simple, the function is hidden behind the view and users do not even know about the data insert.

Please note this is a bit oversimplified scenario, you can easily extend the function to add custom WHERE clause (which I would highly recommend, you don’t want to move whole table over dblink every single time), you can even make the function very universal for any querying of LOB across db-link.

Tagged , , , , , , ,