Feeds:
Posts
Comments

I’ve seen this over and over. Is it possible to load Excel to Oracle database some easy to automate way? Of course there is, there are actually many ways. My favorite is (7).

1. client side executed solution – VBA macro or COM to directly write to Oracle Database (some manual file manipulation required each time you want to load the file )

This is certainly possible and is not complex to code, but I don’t like it at all for two reasons – first you need to install (if its COM) or at least run (in case of VBA) script on client machine and either hardcode uid/psw or allow the user access directly to Oracle Database. Certainly a lot of places where things can go wrong. One place where I used this was ESSBASE – it has decent excel plug-in and allows to write back to essbase cube  (yeah I know essbase is not oracle database).

2. client side executed solution – web app (some manual file manipulation required each time you want to upload the file)

This is a bit nicer solution simply because there is nothing to install and even authentication can be very seamless. If you use for example Oracle Apex, you can use transparent AD authentication. The app can work the way Oracle Apex imports excel data (copy-paste from excel to a text field in the page) which basically pastes the data as tab delimited structure (and saves probably as CLOB if you use Apex).

3. ask for a file in CSV format (some manual file manipulation required each time you want to load the file)

OK, this is obvious easy to deal with format, I’ve used this method many many times and burned my fingers over and over. You would be surprised how many users have problems with SAVE-AS CSV

4.  ask for XML file (no manual work required if excel is kept in XML format)

This is very interesting solution since XML is somewhat natural excel format. If the user keeps his/her file in XML then there is no special file manipulation required on his/her side. Oracle can easily load XML. If you start digging into this, you might see how complex the XML is and how “not so easy” this can be.

5. server side VBScript (can be very easily automated)

If you are lucky and have Windows server with Excel installed then you can easily use Worksheet.SaveAs Method to convert excel to CSV. One strong advantage I see here is the fact that you use Excel object model to manipulate Excel file and thus it should be easy to move to new Excel format (in case of any future  company wide conversion).

6. JAVA in Oracle JVM (can be very easily automated)

I’m really not JAVA guru, but … there are few easy to use conversion classes. Since  Java can run in Oracle JVM and you can use PL/SQL to execute it, it can be painless very easy to use solution. Please note JVM is isolated from Oracle database and thus most of the out-of-the-box classes out there are going to create CSV file in the file system. I found very nice sample here http://www.dynamicpsp.com/!go?ln=nfaq&faq=FAQ9

7. unix SCRIPT executed either stand alone or using 11.2 external table preprocessor (can be very easily automated)

You can either use JAVA (similar to above) or PERL or …  this is really easy to use solution and if you are lucky and have oracle 11.2 then you can utilize preprocessor and simple SELECT * FROM MY_EXTERNAL_TABLE is going to take care of conversion as well as loading to oracle – this is the way I like !! No scripts to execute, no special manual manipulations, simple SELECT * FROM … is taking care of everything.

here is small how-to for (7)  in 11.2

1. take the sample below and create Book1.xls (simply copy paste the content below to blank excel sheet)

2. create four oracle directories pointing to four unix directories (you can of course create one for all files, but let’s try the security tight)

 CREATE DIRECTORY dir_sample_data as '/home/jiri/Desktop/sample/data';
 CREATE DIRECTORY dir_sample_log as  '/home/jiri/Desktop/sample/log';
 CREATE DIRECTORY dir_sample_bad as  '/home/jiri/Desktop/sample/bad';
 CREATE DIRECTORY dir_sample_exec as '/home/jiri/Desktop/sample/execute';

 GRANT READ        on DIRECTORY dir_sample_data to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_log  to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_bad  to JIRI;
 GRANT EXECUTE     on DIRECTORY dir_sample_exec to JIRI;

3. place Book1.xls to /home/jiri/Desktop/sample/data

4. download and install xls2csv perl script - http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv

5. place script below to /home/jiri/Desktop/sample/execute/run_convert.sh – the script executes perl conversion script and spools the output file to stdout (source for oracle external table)

#!/bin/bash
currentDir="/home/oracle/Desktop/sample/data"
/usr/bin/xls2csv -q -x $currentDir/Book1.xls -w Sheet1 -c $currentDir/Book1.dat
/bin/cat $currentDir/Book1.dat

6. create external table with preprocessor and load the data. Please note the script above is going to be executed every time you run SELECT * FROM jiri.sample_01_ext

CREATE TABLE JIRI.SAMPLE_01_EXT
(
 CUSTOMER_ID     NUMBER(10),
 CUSTOMER_NM     VARCHAR2(60),
 CUSTOMER_SSN    VARCHAR2(12)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
 DEFAULT DIRECTORY dir_sample_data
 ACCESS PARAMETERS
 ( RECORDS DELIMITED BY NEWLINE
 PREPROCESSOR dir_sample_exec:'run_convert.sh'
 SKIP 1
 LOGFILE dir_sample_log:'Book1.log'
 BADFILE dir_sample_bad:'Book1.bad'
 NODISCARDFILE
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 MISSING FIELD VALUES ARE NULL
 )
 LOCATION (dir_sample_data:'Book1.xls')
)
REJECT LIMIT 0
NOPARALLEL
NOMONITORING;

It cannot be easier than that. There is one small issue with the script. It is really not that critical, but as you can see run_convert.sh creates CSV file first and thus if two users run the select at the same time, you can might have a conflict. You can of course create unique file name each time or even better change the perl script a bit (this is actually very very easy change) and run the output to stdout instead of a file.

This is very bold statement and really does not truly reflect small trick I would like to show you.

Oracle 11.2 has very interesting new feature -EXTERNAL TABLES PREPROCESSOR. It basically allows to execute any unix script and use the output of the script as a source of external table (classic example is reading data from compressed archive and loading them directly to oracle without any transient files). Please see my other article what’s new in Oracle 11g R2 – external tables preprocessor for more details.

Thanks to this preprocessor you can load pretty much anything even output from any unix script – yes you can read for example df -k on a fly without creating any transient files in unix.

1. Create oracle directories which correspond to existing directories on unix.

 CREATE DIRECTORY dir_sample_data as '/home/jiri/Desktop/sample/data';
 CREATE DIRECTORY dir_sample_log as '/home/jiri/Desktop/sample/log';
 CREATE DIRECTORY dir_sample_bad as '/home/jiri/Desktop/sample/bad';
 CREATE DIRECTORY dir_sample_exec as '/home/jiri/Desktop/sample/execute';
 GRANT READ        on DIRECTORY dir_sample_data to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_log  to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_bad  to JIRI;
 GRANT EXECUTE     on DIRECTORY dir_sample_exec to JIRI;

2. place dummy.dat file to /home/jiri/Desktop/sample/data directory – this is dummy empty file

3. place unix script below to /home/jiri/Desktop/sample/execute/run_my_script.sh

#!/bin/bash
 /bin/df -k

4. create external table with preprocessor

CREATE TABLE JIRI.SAMPLE_01_EXT
 (
 SCRIPT_OUTPUT   VARCHAR2(2000)
 )
 ORGANIZATION EXTERNAL
 ( TYPE ORACLE_LOADER
 DEFAULT DIRECTORY dir_sample_data
 ACCESS PARAMETERS
 ( RECORDS DELIMITED BY NEWLINE
 PREPROCESSOR dir_sample_exec:'run_my_script.sh'
 SKIP 1
 LOGFILE dir_sample_log:'dummy.log'
 BADFILE dir_sample_bad:'dummy.bad'
 NODISCARDFILE
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 MISSING FIELD VALUES ARE NULL
 )
 LOCATION (dir_sample_data:'dummy.dat')
 )
 REJECT LIMIT 0
 NOPARALLEL
 NOMONITORING;

as you can see external tables preprocessor is very simple yet very effective way for loading not only files but pretty much any information from unix. I can see where  SELECT * FROM JIRI.SAMPLE_01_EXT can be much easier to use than executing unix scripts directly. Please note that there is no transient file or script created during this process, 1000 users can run this at the same time with no issues of overlapping files or some home made unique file naming (well not sure if you want to run this 1000x but you certainly can) and no files (except log you can suppress) will be created !

I’ve been big fan of Oracle external tables for very long time and I’m very glad to see there are new features coming in 11g R2.

Oracle introduced preprocessor in 11.2 – basically any unix/windows (in Jan 2010 there is no 11.2 on Windows platform yet) script which executes automatically before the external table read.  The output from the script is the source of external table (via stdout in unix). The classic example is processing compressed files – Oracle executes unix script which runs ucompress and the output of this script is the source of external table.

Now you can say that this is nothing exciting, you probably already run uncompress in unix script for all incoming files you load using external tables. Well there is one or better say three main advantages of this new preprocessor.

  • it executes automatically as part of external table read (every time you use SELECT … FROM )
  • it does NOT have to create any objects in the file system – as I mentioned above the output of the script is the source of external table (in unix stdout)
  • the pre-processor by itself generates the data – I already mentioned uncompress, but this can be pretty much any script – for example space available on unix using df -k or list of currently running sessions, … if you are DBA and run unix script to “see” anything for example top sessions, space available, running sessions, …  well you can run simple SELECT * FROM my_external_table to get this info without leaving oracle environment or creating JAVA stored procedures.

to me all features above are huge, (1) – just imagine you have archive folder with all your source files you loaded in past 90 days, all files are compressed to save space and yet you can give access to these files via external tables to selected users without being forced to uncompress them, (2) is great as well simply because there is no need for any additional space nor any temporary staging files (you have 1GB free space and it would take 20GB to uncompress incoming file – no problem)   (3) – well as I mentioned above any df -k or top or  whatever information you need from unix 20x a day, you can get on a fly via external table.

There is however one glitch and huge drawback, the glitch is simple yet important – you need to really understand what you are doing, you are giving users grants to execute unix command(s) -  you need to keep in mind that you are opening possible security hole. The drawback is the fact that you cannot execute this in parallel – in another words if you use PARALLEL in external tables, you need to do some testing and find out whether separate uncompress + parallel load takes longer than preprocessor uncompress+load with no parallel.  You can (if source system in your organization can do it) receive multiple files and load them in parallel (for example 4 incoming compressed files can be uncompressed on a fly using preprocessor and loaded in parallel) – see sample at the bottom.

Here is small how-to … this was tested on RHEL5 + Oracle Release 11.2.0.1.0 in VMWARE 3.0.0

1. Create oracle directories which correspond to existing directories on unix. Please notice one directory gets EXECUTE grant – this is new in 11.2 and basically specifies where the pre-processor command is stored. I created 4 directories, you can of course create just one, I would really recommend to keep the security in mind and assign write/execute grants only and only in folders you need to get these grants on.

 CREATE DIRECTORY dir_sample_data as '/home/jiri/Desktop/sample/data';
 CREATE DIRECTORY dir_sample_log as '/home/jiri/Desktop/sample/log';
 CREATE DIRECTORY dir_sample_bad as '/home/jiri/Desktop/sample/bad';
 CREATE DIRECTORY dir_sample_exec as '/home/jiri/Desktop/sample/execute';
 GRANT READ        on DIRECTORY dir_sample_data to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_log  to JIRI;
 GRANT READ, WRITE on DIRECTORY dir_sample_bad  to JIRI;
 GRANT EXECUTE     on DIRECTORY dir_sample_exec to JIRI;

2. place sample.dat file (see content below) to /home/jiri/Desktop/sample/data directory

CUSTOMER_ID, CUSTOMER_NAME, CUSTOMER_SSN
 123456, JIRI, 444-555-6666
 234567, TOM, 555-666-5555
 345678, CHRIS, 111-222-3333

3. compress it  (I used gzip in this example)

gzip sample.dat

4. place unix script below to /home/jiri/Desktop/sample/execute/run_my_unzip.sh (please notice gunzip output is stdout, not a file – this is important)

#!/bin/bash
 /bin/gunzip -c $1

5. create external table (this is classic comma separated file with header), please notice PREPROCESSOR dir_sample_exec:’run_my_unzip.sh’ you have to specify existing script name, the script has to exist in that directory and you cannot specify any parameters – for example PREPROCESSOR dir_sample_exec:’unzip -c’ will not work and will return error. Although I fully understand this limitation (I believe it is there for security reasons and prevents injecting custom code), this could theoretically open door to very effective decryption – just imagine you receive encrypted files and the passphrase to decrypt the file using private key exists only in oracle (no transient files are created in unix, no SA can decrypt the file, only person with access to DDL can do it – this can be secured even more with oracle vault). I know that there are existing ways to do this, but preprocessor would be a bit easier.

CREATE TABLE JIRI.SAMPLE_01_EXT
 (
 CUSTOMER_ID     NUMBER(6),
 CUSTOMER_NAME   VARCHAR2(60),
 CUSTOMER_SSN    VARCHAR2(60)
 )
 ORGANIZATION EXTERNAL
 ( TYPE ORACLE_LOADER
 DEFAULT DIRECTORY dir_sample_data
 ACCESS PARAMETERS
 ( RECORDS DELIMITED BY NEWLINE
 PREPROCESSOR dir_sample_exec:'run_my_unzip.sh'
 SKIP 1
 LOGFILE dir_sample_log:'sample.log'
 BADFILE dir_sample_bad:'sample.bad'
 NODISCARDFILE
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 MISSING FIELD VALUES ARE NULL
 )
 LOCATION (dir_sample_data:'sample.dat.gz')
 )
 REJECT LIMIT 0
 NOPARALLEL
 NOMONITORING;

as you can see external tables preprocessor is very simple yet very effective for loading files oracle cannot load directly. The most common example would be uncompress or decrypt, but you can of course use more complex scripts and strip footer on a fly or even include FTP as part of your external table load. See my other article no more unix scripts in oracle 11.2 !! for few other a bit unusual samples.

Below is similar external table as above, the only difference is loading in parallel (4 identical compressed files with no header)

CREATE TABLE JIRI.SAMPLE_01_EXT
(
CUSTOMER_ID     NUMBER(6),
CUSTOMER_NAME   VARCHAR2(60),
CUSTOMER_SSN    VARCHAR2(60)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY dir_sample_data
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE
PREPROCESSOR dir_sample_exec:'run_my_unzip.sh'
SKIP 0
LOGFILE dir_sample_log:'sample.log'
BADFILE dir_sample_bad:'sample.bad'
NODISCARDFILE
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
)
LOCATION (dir_sample_data:'sample1.dat.gz', 'sample2.dat.gz', 'sample3.dat.gz','sample4.dat.gz')
)
REJECT LIMIT 0
PARALLEL 4
NOMONITORING;

drop me a note if you have any questions

Older Posts »