I was always jealous to see Informix and MS SQL users unloading large tables or queries from database to ASCII with few simple commands or/and clicks and never really understood why Oracle did not implement easy and fast unload solution. Yes Oracle 10.2 uses excellent Data Pump and unload for backup purposes is fast and easy, but this is not ascii format.
I’m sure the most common solution is (or better say was until you started reading this blog) to use SPOOL in SQL*PLUS. Although that might be good solution for some, I don’t like it for two reasons – first it’s not fast enough, the second you have to run the command in SQL*PLUS which really limits the automations (you have to leave PL/SQL just to spool data?).
The core of code below comes from Adrian Billington – you can read his excellent blog here . I appended little bit of logging and changed some of the calls so you don’t need DBA grants.
The function (yes it’s simple function you can call from SQL) uses UTL_FILE for data spool and PIPELINE function for optional parallel spooling (yes it’s correct you can spool in parallel thanks to pipeline function features). Pipeline function is amazing thing in general (I used it for very complex forecasting calculations presented in BI where BI is not aware of stored procedures or functions and required to get data back fast). The other great feature of pipelined function is parallelizm (utilized here).
why do I love this solution?
- it’s very simple SQL, no messy SQL*PLUS commands, no special line size or ON/OFF switches
- because it’s SQL you can execute it from almost anywhere, you can even plug it into PL/SQL
- it has very robust logging (both internal – result of SQL) and optional external – unix LOG file
- it’s fast – if you run this in parallel, you can achieve very high speed (I was able to get really 4x faster dump in parallel 4 with no speed degradation)
CREATE OR REPLACE TYPE dump_ot AS OBJECT ( file_name VARCHAR2(128) , directory_name VARCHAR2(128) , no_records NUMBER , session_id NUMBER , start_dttm TIMESTAMP WITH TIME ZONE , end_dttm TIMESTAMP WITH TIME ZONE ); CREATE OR REPLACE TYPE dump_ntt AS TABLE OF dump_ot; CREATE OR REPLACE FUNCTION DATA_UNLOAD ( p_source IN SYS_REFCURSOR, p_filename IN VARCHAR2, p_directory IN VARCHAR2, p_unique_filename IN VARCHAR2 DEFAULT 'N', p_create_log_file IN VARCHAR2 DEFAULT 'N' ) RETURN dump_ntt PIPELINED PARALLEL_ENABLE (PARTITION p_source BY ANY) AS /* p_source SQL query you spool p_filename targe file name p_directory targe Oracle directory p_unique_filename create unique file name? Y/N (appends unique SID to p_filename) - useful only for parallel unload p_create_log_file create log file? Y/N (creates separate log file and logs every 1mm rows) - has very small performance hit on the spool */ TYPE row_ntt IS TABLE OF VARCHAR2(32767); v_rows row_ntt; v_file UTL_FILE.FILE_TYPE; v_log_file UTL_FILE.FILE_TYPE; v_buffer VARCHAR2(32767); v_sid VARCHAR(255); v_name VARCHAR2(255); v_lines PLS_INTEGER := 0; v_start_dttm TIMESTAMP WITH TIME ZONE:= SYSTIMESTAMP; v_end_dttm TIMESTAMP WITH TIME ZONE; v_create_log BOOLEAN := FALSE; c_eol CONSTANT VARCHAR2(1) := CHR(10); c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); c_maxline CONSTANT PLS_INTEGER := 32767; c_log_limit CONSTANT PLS_INTEGER := 1000000; BEGIN v_sid := LPAD(sys_context ('USERENV', 'sid'), 10, '0') ; v_name := p_filename ; IF TRIM(UPPER(p_create_log_file)) = 'Y' THEN v_create_log := TRUE; END IF; -- add SID (must be used for parallel spooling, single spooling or spooling across db-link creates alsway one file) IF upper(p_unique_filename) = 'Y' THEN v_name := v_name || '_' || TO_CHAR(v_sid) ; END IF; v_file := UTL_FILE.FOPEN(p_directory, v_name, 'w', c_maxline); IF v_create_log THEN v_log_file := UTL_FILE.FOPEN(p_directory, v_name || '.LOG', 'w', c_maxline); UTL_FILE.PUT_LINE(v_log_file, TO_CHAR(v_start_dttm, 'DD-MON-YYYY HH24:MI:SS:FF3') || ' --> START' ); UTL_FILE.FFLUSH(v_log_file); END IF; LOOP FETCH p_source BULK COLLECT INTO v_rows LIMIT 1000; FOR i IN 1 .. v_rows.COUNT LOOP IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline THEN v_buffer := v_buffer || c_eol || v_rows(i); ELSE IF v_buffer IS NOT NULL THEN UTL_FILE.PUT_LINE(v_file, v_buffer); END IF; v_buffer := v_rows(i); END IF; END LOOP; v_lines := v_lines + v_rows.COUNT; IF v_create_log and MOD(v_lines, c_log_limit) = 0 THEN UTL_FILE.PUT_LINE(v_log_file, TO_CHAR(SYSTIMESTAMP, 'DD-MON-YYYY HH24:MI:SS:FF3') || ' --> ' || v_lines ); UTL_FILE.FFLUSH(v_log_file); END IF; EXIT WHEN p_source%NOTFOUND; END LOOP; CLOSE p_source; UTL_FILE.PUT_LINE(v_file, v_buffer); UTL_FILE.FCLOSE(v_file); v_end_dttm := SYSTIMESTAMP; IF v_create_log THEN UTL_FILE.PUT_LINE(v_log_file, TO_CHAR(v_end_dttm, 'DD-MON-YYYY HH24:MI:SS:FF3') || ' --> ' || v_lines); UTL_FILE.PUT_LINE(v_log_file, TO_CHAR(v_end_dttm, 'DD-MON-YYYY HH24:MI:SS:FF3') || ' --> END'); UTL_FILE.FCLOSE(v_log_file); END IF; PIPE ROW (dump_ot(v_name, p_directory, v_lines, v_sid, v_start_dttm, v_end_dttm)); RETURN; END;
all samples below use Oracle directory DIR_USERS_JIRI, in my case it maps to unix folder /work/users/jiri
CREATE OR REPLACE DIRECTORY "DIR_USERS_JIRI" AS '/work/users/jiri/'; GRANT READ, WRITE ON DIRECTORY DIR_USERS_JIRI TO JIRI;
and sample table MYTABLE (this sample table is based on ALL_TAB_COLUMNS view).
CREATE TABLE MYTABLE AS SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE FROM ALL_TAB_COLUMNS;
pipe delimited file with no header, no log, no parallel mode
SELECT * FROM TABLE( DATA_UNLOAD( CURSOR( SELECT /*+ NO_PARALLEL(A) */ TABLE_NAME || '|' || COLUMN_NAME || '|' || DATA_TYPE FROM MYTABLE A ), 'SAMPLE_SPOOL.TXT', 'DIR_USERS_JIRI', 'N', 'N' ) ) ; FILE_NAME DIRECTORY_NAME NO_RECORDS SESSION_ID START_DTTM END_DTTM SAMPLE_SPOOL.TXT DIR_USERS_JIRI 20796 555 03/18/2009 4:26:39.515437 PM -05:00 03/18/2009 4:26:47.512584 PM -05:00
as you can see it is simple SQL statement, the SQL returns number of rows and total run time – execute it as any other SQL
pipe delimited file with no header, unix log created, no parallel mode
SELECT * FROM TABLE( DATA_UNLOAD( CURSOR( SELECT /*+ NO_PARALLEL(A) */ TABLE_NAME || '|' || COLUMN_NAME || '|' || DATA_TYPE FROM MYTABLE A ), 'SAMPLE_SPOOL.TXT', 'DIR_USERS_JIRI', 'N', 'Y' ) ) ; FILE_NAME DIRECTORY_NAME NO_RECORDS SESSION_ID START_DTTM END_DTTM SAMPLE_SPOOL.TXT DIR_USERS_JIRI 20796 555 03/18/2009 4:26:39.515437 PM -05:00 03/18/2009 4:26:47.512584 PM -05:00
this is exactly same as SAMPLE1 except additional text file is created in the same directory as the main file, the file has identical name as main file, it as extension .LOG The log file shows start time, end time and logs every 1mm records. It is useful to use it during very large data spools – sometimes it’s handy to see the progress and how long 1mm records takes. I did not notice any significant speed degradation in comparison to sample1 (writing a row for every 1mm records can take anywhere from few seconds to minutes depending on how wide the spool is and how fast your file system is).
pipe delimited file with no header, unix log created, in parallel mode – this is where it gets interesting and where you really gain power of pipeline function.
SELECT * FROM TABLE( DATA_UNLOAD( CURSOR( SELECT /*+ PARALLEL(A, 2, 1) */ TABLE_NAME || '|' || COLUMN_NAME || '|' || DATA_TYPE FROM MYTABLE A ), 'SAMPLE_SPOOL.TXT', 'DIR_USERS_JIRI', 'Y', 'Y' ) ) ; FILE_NAME DIRECTORY_NAME NO_RECORDS SESSION_ID START_DTTM END_DTTM SAMPLE_SPOOL_0000000555.TXT DIR_USERS_JIRI 10796 555 03/18/2009 4:44:57.141247 PM -05:00 03/18/2009 4:44:58.034521 PM -05:00 SAMPLE_SPOOL_0000000666.TXT DIR_USERS_JIRI 10000 666 03/18/2009 4:44:57.141264 PM -05:00 03/18/2009 4:44:58.047624 PM -05:00
parallel mode creates X (in this case 2) files, the files can be merged on unix level (or simple left separate), the files will not be exactly same size – parallel is based on oracle’s internal ‘best guess” and yes it really helps to analyze all underlying tables. The files will however preserve full records (one row will not span across 2 files).
SQL will return X rows (in this case 2) showing file names, number of records and start-end times
there will be 2 unix log files created (optional)
Please note if you do not specify p_unique_filename=’Y’ and run the query in parallel, then all spools will run into one file and override each other (you will not get correct result) !!
other formats – you can easily spool to comma delimited, add header or even DOS end-of-line character. It is really up to your SQL statement. Here is comma delimited file with header and DOS end-of-line
SELECT * FROM TABLE( DATA_UNLOAD( CURSOR( SELECT 'TABLE_NAME,COLUMN_NAME,DATA_TYPE' FROM DUAL UNION ALL SELECT /*+ NO_PARALLEL(A) */ TABLE_NAME || ',' || COLUMN_NAME || ',' || DATA_TYPE || CHR(13) FROM MYTABLE A ), 'SAMPLE_SPOOL.TXT', 'DIR_USERS_JIRI', 'N', 'N' ) ) ;