The structure of the input records generated by the CSV input format is determined at run time, depending on the data being parsed, and on the values specified for the input format parameters.
The first two input record fields are fixed, and they are described in the following table:
Name | Type | Description |
---|---|---|
Filename | STRING | Full path of the file containing this entry |
RowNumber | INTEGER | Line in the file containing this entry |
The number of fields detected by the CSV input format during the
initial inspection phase dictates how the CSV record fields will be
extracted from the input data during the subsequent parsing
stage.
If a CSV line contains less fields than the number of fields
established, the missing fields are returned as NULL values.
On the other hand, if a CSV line contains more fields than the
number of fields established, the extra fields are parsed as if
they were part of the value of the last field expected by the CSV
input format.
When the "nFields" parameter is set to -1, the CSV input format
determines the number of fields by inspecting the input CSV
data.
If the "fixedFields" parameter is set to "ON", indicating that all
the rows in the CSV file have the same fixed number of fields, then
the number of fields is determined by parsing either the first line
of the CSV input data, or the first line of the header file
specified with the "iHeaderFile" parameter.
On the other hand, if the "fixedFields" parameter is set to "OFF",
indicating that the rows in the CSV file have a variable number of
fields, then the number of fields is assumed to be the largest
number of fields found among the first n lines of the CSV
input data (eventually including the first line of the header file
specified with the "iHeaderFile" parameter), where n is the
value of the "dtLines" parameter.
As an example, the following CSV file contains a variable number of fields:
Name, City, AreaCode Jeff, Redmond, 425 Steve, Seattle, 206, 98101 Edward, Olympia, 360When parsed with the "nFields" parameter set to -1 and the "fixedFields" parameter set to "ON", this CSV file would yield three fields ("Name", "City", and "AreaCode").
When the "nFields" parameter is set to a value greater than
zero, the CSV input format uses the specified value as the number
of fields in the input data.
However, if the "fixedFields" parameter is set to "OFF", indicating
that the rows in the CSV file have a variable number of fields,
then the CSV input format uses the value of the "nFields" parameter
as a "suggested minimum" number of fields, and it examines the
first n lines of the CSV input data (eventually including
the first line of the header file specified with the "iHeaderFile"
parameter), where n is the value of the "dtLines" parameter,
to determine the number of fields among these lines.
If lines are found containing more fields than the value specified
for the "nFields" parameter, then the number of fields is adjusted
to the largest number of fields found among the first n
lines.
Considering again the previous CSV example file, parsing the
file with the "nFields" parameter set to 3 and the "fixedFields"
parameter set to "ON" would yield three fields.
However, setting the "fixedFields" parameter to "OFF" and the
"dtLines" parameter to any value greater than 2 would yield four
fields, detecting the extra field in the second record.
When the "headerRow" parameter is set to "ON", the CSV input
format assumes that the first line in the CSV file being parsed is
a header containing the field names.
In this case, if the "iHeaderFile" parameter is left unspecified,
the CSV input format extracts the field names from the header
line.
On the other hand, if the "iHeaderFile" parameter is set to the
path of a CSV file containing at least one line, then the CSV input
format assumes that the specified file contains a header, parses
its first line only, and extracts the field names from this line,
ignoring the first line of the CSV file being parsed.
If the number of field names extracted is less than the number of fields detected, the additional fields are automatically named "FieldN", with N being a progressive index indicating the field position in the input record.
Considering the previous example CSV file, setting the
"headerRow" parameter to "ON" would cause the CSV input format to
use the first line of the CSV file as a header containing the field
names.
With the "fixedFields" parameter set to "ON", the CSV input format
would detect three fields, whose names would be "Name", "City", and
"AreaCode".
On the other hand, with the "fixedFields" parameter set to "OFF",
the CSV input format would detect four fields, named "Name",
"City", "AreaCode", and "Field4".
When the "headerRow" parameter is set to "OFF", the CSV input
format assumes that the CSV file being parsed does not contain a
header, and that its first line is the first data record in the
file.
In this case, if the "iHeaderFile" parameter is set to the path of
a CSV file containing at least one line, then the CSV input format
assumes that the specified file contains a header, parses its first
line only, and extracts the field names from this line.
On the other hand, if the "iHeaderFile" parameter is left
unspecified, the fields are automatically named "FieldN",
with N being a progressive number indicating the field
position in the input record.
As an example, the following CSV file does not contain a header line:
Jeff, Redmond, 425 Steve, Seattle, 206 Edward, Olympia, 360When parsed with the "headerRow" parameter to "OFF", the CSV input format assumes that the first line of the CSV file is the first data record in the file. In this case, the three fields would be named "Field1", "Field2", and "Field3".
Empty field values are returned as NULL values.
© 2004 Microsoft Corporation. All rights reserved.