I’ve been big fan of Oracle external tables for very long time and I’m very glad to see there are new features coming in 11g R2.
Oracle introduced preprocessor in 11.2 – basically any unix/windows (in Jan 2010 there is no 11.2 on Windows platform yet) script which executes automatically before the external table read. The output from the script is the source of external table (via stdout in unix). The classic example is processing compressed files – Oracle executes unix script which runs ucompress and the output of this script is the source of external table.
Now you can say that this is nothing exciting, you probably already run uncompress in unix script for all incoming files you load using external tables. Well there is one or better say three main advantages of this new preprocessor.
- it executes automatically as part of external table read (every time you use SELECT … FROM )
- it does NOT have to create any objects in the file system – as I mentioned above the output of the script is the source of external table (in unix stdout)
- the pre-processor by itself generates the data – I already mentioned uncompress, but this can be pretty much any script – for example space available on unix using df -k or list of currently running sessions, … if you are DBA and run unix script to “see” anything for example top sessions, space available, running sessions, … well you can run simple SELECT * FROM my_external_table to get this info without leaving oracle environment or creating JAVA stored procedures.
to me all features above are huge, (1) – just imagine you have archive folder with all your source files you loaded in past 90 days, all files are compressed to save space and yet you can give access to these files via external tables to selected users without being forced to uncompress them, (2) is great as well simply because there is no need for any additional space nor any temporary staging files (you have 1GB free space and it would take 20GB to uncompress incoming file – no problem) (3) – well as I mentioned above any df -k or top or whatever information you need from unix 20x a day, you can get on a fly via external table.
There is however one glitch and huge drawback, the glitch is simple yet important – you need to really understand what you are doing, you are giving users grants to execute unix command(s) – you need to keep in mind that you are opening possible security hole. The drawback is the fact that you cannot execute this in parallel – in another words if you use PARALLEL in external tables, you need to do some testing and find out whether separate uncompress + parallel load takes longer than preprocessor uncompress+load with no parallel. You can (if source system in your organization can do it) receive multiple files and load them in parallel (for example 4 incoming compressed files can be uncompressed on a fly using preprocessor and loaded in parallel) – see sample at the bottom.
Here is small how-to … this was tested on RHEL5 + Oracle Release 18.104.22.168.0 in VMWARE 3.0.0
1. Create oracle directories which correspond to existing directories on unix. Please notice one directory gets EXECUTE grant – this is new in 11.2 and basically specifies where the pre-processor command is stored. I created 4 directories, you can of course create just one, I would really recommend to keep the security in mind and assign write/execute grants only and only in folders you need to get these grants on.
CREATE DIRECTORY dir_sample_data as '/home/jiri/Desktop/sample/data'; CREATE DIRECTORY dir_sample_log as '/home/jiri/Desktop/sample/log'; CREATE DIRECTORY dir_sample_bad as '/home/jiri/Desktop/sample/bad'; CREATE DIRECTORY dir_sample_exec as '/home/jiri/Desktop/sample/execute';
GRANT READ on DIRECTORY dir_sample_data to JIRI; GRANT READ, WRITE on DIRECTORY dir_sample_log to JIRI; GRANT READ, WRITE on DIRECTORY dir_sample_bad to JIRI; GRANT EXECUTE on DIRECTORY dir_sample_exec to JIRI;
2. place sample.dat file (see content below) to /home/jiri/Desktop/sample/data directory
CUSTOMER_ID, CUSTOMER_NAME, CUSTOMER_SSN 123456, JIRI, 444-555-6666 234567, TOM, 555-666-5555 345678, CHRIS, 111-222-3333
3. compress it (I used gzip in this example)
4. place unix script below to /home/jiri/Desktop/sample/execute/run_my_unzip.sh (please notice gunzip output is stdout, not a file – this is important)
#!/bin/bash /bin/gunzip -c $1
5. create external table (this is classic comma separated file with header), please notice PREPROCESSOR dir_sample_exec:’run_my_unzip.sh’ you have to specify existing script name, the script has to exist in that directory and you cannot specify any parameters – for example PREPROCESSOR dir_sample_exec:’unzip -c’ will not work and will return error. Although I fully understand this limitation (I believe it is there for security reasons and prevents injecting custom code), this could theoretically open door to very effective decryption – just imagine you receive encrypted files and the passphrase to decrypt the file using private key exists only in oracle (no transient files are created in unix, no SA can decrypt the file, only person with access to DDL can do it – this can be secured even more with oracle vault). I know that there are existing ways to do this, but preprocessor would be a bit easier.
CREATE TABLE JIRI.SAMPLE_01_EXT ( CUSTOMER_ID NUMBER(6), CUSTOMER_NAME VARCHAR2(60), CUSTOMER_SSN VARCHAR2(60) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY dir_sample_data ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR dir_sample_exec:'run_my_unzip.sh' SKIP 1 LOGFILE dir_sample_log:'sample.log' BADFILE dir_sample_bad:'sample.bad' NODISCARDFILE FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' MISSING FIELD VALUES ARE NULL ) LOCATION (dir_sample_data:'sample.dat.gz') ) REJECT LIMIT 0 NOPARALLEL NOMONITORING;
as you can see external tables preprocessor is very simple yet very effective for loading files oracle cannot load directly. The most common example would be uncompress or decrypt, but you can of course use more complex scripts and strip footer on a fly or even include FTP as part of your external table load. See my other article no more unix scripts in oracle 11.2 !! for few other a bit unusual samples.
Below is similar external table as above, the only difference is loading in parallel (4 identical compressed files with no header)
CREATE TABLE JIRI.SAMPLE_01_EXT ( CUSTOMER_ID NUMBER(6), CUSTOMER_NAME VARCHAR2(60), CUSTOMER_SSN VARCHAR2(60) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY dir_sample_data ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR dir_sample_exec:'run_my_unzip.sh' SKIP 0 LOGFILE dir_sample_log:'sample.log' BADFILE dir_sample_bad:'sample.bad' NODISCARDFILE FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' MISSING FIELD VALUES ARE NULL ) LOCATION (dir_sample_data:'sample1.dat.gz', 'sample2.dat.gz', 'sample3.dat.gz','sample4.dat.gz') ) REJECT LIMIT 0 PARALLEL 4 NOMONITORING;
drop me a note if you have any questions