The structure of the input records generated by the TSV input format is determined at run time, depending on the data being parsed, and on the values specified for the input format parameters.
The first two input record fields are fixed, and they are described in the following table:
Name | Type | Description |
---|---|---|
Filename | STRING | Full path of the file containing this entry |
RowNumber | INTEGER | Line in the file containing this entry |
The number of fields detected by the TSV input format during the
initial inspection phase dictates how the record fields will be
extracted from the input data during the subsequent parsing
stage.
If a line contains less fields than the number of fields
established, the missing fields are returned as NULL values.
On the other hand, if a line contains more fields than the number
of fields established, the extra fields are parsed as if they were
part of the value of the last field expected by the TSV input
format.
When the "nFields" parameter is set to -1, the TSV input format
determines the number of fields by inspecting the first line of the
input data, or the first line of the header file specified with the
"iHeaderFile" parameter.
As an example, the following TSV file contains a variable number of
fields:
Name City AreaCode Jeff Redmond 425 Steve Seattle 206 98101 Edward Olympia 360When parsed with the "nFields" parameter set to -1, this TSV file would yield three fields ("Name", "City", and "AreaCode").
When the "nFields" parameter is set to a value greater than zero, the TSV input format uses the specified value as the number of fields in the input data. Considering again the previous example file, parsing the file with the "nFields" parameter set to 4 would yield four fields.
When the "headerRow" parameter is set to "ON", the TSV input
format assumes that the first line in the file being parsed is a
header containing the field names.
In this case, if the "iHeaderFile" parameter is left unspecified,
the TSV input format extracts the field names from the header
line.
On the other hand, if the "iHeaderFile" parameter is set to the
path of a TSV file containing at least one line, then the TSV input
format assumes that the specified file contains a header, parses
its first line only, and extracts the field names from this line,
ignoring the first line of the file being parsed.
If the number of field names extracted is less than the number of fields detected, the additional fields are automatically named "FieldN", with N being a progressive index indicating the field position in the input record.
Considering the previous example file, setting the "headerRow"
parameter to "ON" would cause the TSV input format to use the first
line of the file as a header containing the field names.
With the "nFields" parameter set to -1, the TSV input format would
detect three fields, whose names would be "Name", "City", and
"AreaCode".
On the other hand, with the "nFields" parameter set to 4, the TSV
input format would detect four fields, named "Name", "City",
"AreaCode", and "Field4".
When the "headerRow" parameter is set to "OFF", the TSV input
format assumes that the file being parsed does not contain a
header, and that its first line is the first data record in the
file.
In this case, if the "iHeaderFile" parameter is set to the path of
a TSV file containing at least one line, then the TSV input format
assumes that the specified file contains a header, parses its first
line only, and extracts the field names from this line.
On the other hand, if the "iHeaderFile" parameter is left
unspecified, the fields are automatically named "FieldN",
with N being a progressive number indicating the field
position in the input record.
As an example, the following TSV file does not contain a header line:
Jeff Redmond 425 Steve Seattle 206 Edward Olympia 360When parsed with the "headerRow" parameter to "OFF", the TSV input format assumes that the first line of the TSV file is the first data record in the file. In this case, the three fields would be named "Field1", "Field2", and "Field3".
Empty field values are returned as NULL values.
© 2004 Microsoft Corporation. All rights reserved.