reading file with header and footer
I would like to dedicate Part 2 of the series about Oracle External Tables to reading files with header and footer ( as part 1, all these samples were tested on 10.2.0.4 on Sun10).
I love files with header and footer and really try to use it every time I can. This is probably personal preference and some developers like plain file with no header/no footer and separate control file (either ascii or XML). The reason why I like file with header and footer is very simple
you transfer only one file – sometimes less is better 🙂
it is very easy to see whether you received partial file just by looking at the TAIL of the file (think about your production support team during your development)
if the header/tail is done correctly and follows the same standards for all files, you get not only the actual data but also job name, when the job ran, who ran it, … in one single file (there is nothing to loose when you move files from folder to folder)
of course, there are some disadvantages as well …
speed (it is faster to load separate control file – see below)
you can add structured information to the control file (if you use XML) – just imagine that all your incoming files have uniform XML format control file
if you want to use control file(s) for generic statistics (some companies create centralized control files ETL and load all control files to one repository) – parsing footer is always more complex. If you ask why they would do that, then the answer is simple – this gives you excellent view on all your incoming files (sizes, timings, ….) – you can see how things change over time and plan space and network utilization better.
in general I try to follow simple formula LESS CODING = SMALLER ROOM FOR ERROR
there are 3 solutions for reading files with header and footer using oracle external tables
1. use unix level tool and remove footer (and header if you wish)
I don’t like this method because it requires “touching” the file. I prefer to keep the file in a raw state without any manipulations (sometimes this is not possible of course, sometimes unix tools like syncsort perform much faster – it really depends of file structure).
2. reject limit
The other solution is simply setting SKIP 1 (for header) and REJECT LIMIT 1 (for footer) – let oracle reject one row (footer) every time you read the data. I really don’t want to say that I don’t use this method, because I do, but I really try to avoid it. Just the idea of blindly rejecting 1 row every time you read the file does not make me feel good.
3. LOAD WHEN
This is my preferred method. You basically use WHERE clause (called LOAD WHEN) and load only records you want to load (skip HEADER and FOOTER or whatever records you want).
here is small sample …
I will base the sample on 3 oracle directories which map to three unix (or windows) directories. I always recommend to separate actual data directory from LOG/BAD directories. This is very good security practice – oracle ID can have READ access to data files and READ/WRITE to LOG/BAD files. All my samples will use schema JIRI
CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI_DATA" AS '/work/users/jiri/data'; CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI_LOG" AS '/work/users/jiri/log'; CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI_BAD" AS '/work/users/jiri/bad'; GRANT READ, WRITE ON DIRECTORY DIR_USERS_JIRI_DATA TO JIRI; GRANT READ ON DIRECTORY DIR_USERS_JIRI_LOG TO JIRI; GRANT READ ON DIRECTORY DIR_USERS_JIRI_BAD TO JIRI;
let’s assume you have simple ASCII comma separated file with LF (unix) end-of-line character, the file has header row, you can download sample file from here
CREATE TABLE JIRI.SAMPLE_02_BODY_EXT ( ROW_TYPE VARCHAR2(1), PRODUCT VARCHAR2(60), NUMBER_OF_UNITS NUMBER(10,0), TOTAL_PRICE NUMBER(10,2) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY DIR_USERS_JIRI_DATA ACCESS PARAMETERS ( RECORDS DELIMITED BY '\n' LOAD WHEN ( ROW_TYPE = 'B') SKIP 0 LOGFILE DIR_USERS_JIRI_LOG:'sample_02.log' BADFILE DIR_USERS_JIRI_BAD:'sample_02.bad' NODISCARDFILE FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' MISSING FIELD VALUES ARE NULL ) LOCATION (DIR_USERS_JIRI_DATA:'sample_02.dat') ) REJECT LIMIT 0 NOPARALLEL NOMONITORING;CREATE TABLE JIRI.SAMPLE_02_TAIL_EXT ( ROW_TYPE VARCHAR2(1), JOB_NM VARCHAR2(255), EXECUTED_BY VARCHAR2(255), JOB_STATED_ON VARCHAR2(255), SOURCE_SERVER_NM VARCHAR2(255) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY DIR_USERS_JIRI_DATA ACCESS PARAMETERS ( RECORDS DELIMITED BY '\n' LOAD WHEN ( ROW_TYPE = 'T') SKIP 0
This is very clean and easy way how to load data as well as trailing record(s) without using reject and any unix tool to strip the record.
Please note that I did not have to use SKIP for skipping header row. Also I used excape characters – if you run samples above on unix server, you can use RECORDS DELIMITED BY NEWLINE instead.
As I mentioned before, there is a drawback of LOAD WHEN solution. If you have very large file then reading SAMPLE_02_TAIL_EXT might take long time (much longer than using separate control file).Advertisements