I need to control the data type when reading XML data in SAS. The XML data are written and accessed using the XML libname engine in SAS.
SAS seems to guess the data type based on the contents of a column: If I write "20081002" to my XML data in a character column, it will be read back in as a numerical variable.
An example:
filename my_xml '/tmp/my.xml'; * Yes, I use SAS on Unix *;
libname my_xml XML;
data my_xml.data_type_test;
text_char="This is obviously text";
date_char="20081002";
num_char="42";
genuine_num=42;
run;
proc copy inlib=my_xml outlib=WORK;
run;
libname my_xml;
filename my_xml CLEAR;
Only the last column is defined as numerical data type in the XML data, but when I copy it into my WORK library, only the column *text_char* is character. The other 3 are now numeric.
How can I control the data type when reading XML data in SAS?
-
I think you need to define some xml specific options whith your libname XML statement for export go:
libname my_xml_out XML XMLMETA=SCHEMADATA;
To include the data schema. Also, you might want to save the XML schema to a separate file for later import:
libname my_xml_in XML XMLSCHEMA='external-file'
after you exported the schema using
XMLMETA=SCHEMA
of course. I think this is the documentation you need.Apart from that liberal use of format statements on original dataset creation is recommended.
Martin Bøgelund : Thanks for the input. Unfortunately, I can't get it to work - SAS still claims that my character data that look like numbers, are of the NUM data type, when read from XML using XMLSCHEMA. -
Take a look at the SAS XML Mapper. It allows you to create a map to read (and wrte in 9.2) XML files and specifying column attributes.
If this is your XML file:
This is obviously text 20081002 42 42
You could create a MAP like this:
<!-- ############################################################ --> <TABLE name="DATA_TYPE_TEST"> <TABLE-PATH syntax="XPath">/TABLE/DATA_TYPE_TEST</TABLE-PATH> <COLUMN name="text_char"> <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/text_char</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>22</LENGTH> </COLUMN> <COLUMN name="date_char"> <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/date_char</PATH> <TYPE>numeric</TYPE> <DATATYPE>integer</DATATYPE> <FORMAT width="9">DATE</FORMAT> <INFORMAT width="8">ND8601DA</INFORMAT> </COLUMN> <COLUMN name="num_char"> <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/num_char</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>2</LENGTH> </COLUMN> <COLUMN name="genuine_num"> <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/genuine_num</PATH> <TYPE>numeric</TYPE> <DATATYPE>integer</DATATYPE> </COLUMN> </TABLE>
And then read the XML file:
filename my 'C:\temp\my.xml'; filename SXLEMAP 'C:\temp\MyMap.map'; libname my xml xmlmap=SXLEMAP access=READONLY; title 'Table DATA_TYPE_TEST'; proc contents data=my.DATA_TYPE_TEST varnum; run; proc print data=my.DATA_TYPE_TEST(obs=10); run;
Result:
Table DATA_TYPE_TEST The CONTENTS Procedure Data Set Name MY.DATA_TYPE_TEST Observations Member Type DATA Variables 4 Engine XML Indexes 0 Created . Observation Length 0 Last Modified . Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Data Representation Default Encoding Default Variables in Creation Order # Variable Type Len Format Informat Label 1 text_char Char 22 $22. $22. text_char 2 date_char Num 8 DATE9. ND8601DA8. date_char 3 num_char Char 2 $2. $2. num_char 4 genuine_num Num 8 F8. F8. genuine_num Table DATA_TYPE_TEST genuine_ Obs text_char date_char num_char num 1 This is obviously text 02OCT2008 42 42
0 comments:
Post a Comment