Question

XLS to GDB: some integer values in sheet show up with decimals in attribute table


Badge

Hi,

I'm converting excel sheets to gdb feature classes and I'm not sure why but some of the values in my attribute table are different from the actual values in the sheets.

Example images:

This wouldn't be a huge deal except for the fact that the fields with the decimal values are being recognized as text. I have to create new numerical fields and migrate the data using the field calculator and would like to prevent this somehow.

 

There isn't any formatting in the cells, and when I do migrate the data to a new field, the values revert back to what they are in the excel sheets. I've tried copy/pasting values into a new worksheet and end up with the same thing in my attribute table.

Has anyone seen this before?

 

Thanks!


14 replies

Badge

Hi @badams

this must be a float vs. string issue. I can provide a bit more details if you are interested, but briefly, it is about how numeric values are stored/represented.

What is the destination geodatabase attribute data type? Is it numeric? And how are the source values read? Does the source attribute have data type number? What is its precision (please check Add Reader > Parameters dialog)? Could you please try reading the values with precision 0?

Badge

Hi @badams

this must be a float vs. string issue. I can provide a bit more details if you are interested, but briefly, it is about how numeric values are stored/represented.

What is the destination geodatabase attribute data type? Is it numeric? And how are the source values read? Does the source attribute have data type number? What is its precision (please check Add Reader > Parameters dialog)? Could you please try reading the values with precision 0?

Hi @LenaAtSafe

 

Thanks for your answer.

 

When I view the Reader's Parameters, the precision for the columns (that show up with the decimals) is greyed out. I will be using this workspace to convert hundreds of xls files to gdb feature classes and am looking for a solution that will not cause me to have to do any clicking hundreds of times.
Badge
Hi @LenaAtSafe

 

Thanks for your answer.

 

When I view the Reader's Parameters, the precision for the columns (that show up with the decimals) is greyed out. I will be using this workspace to convert hundreds of xls files to gdb feature classes and am looking for a solution that will not cause me to have to do any clicking hundreds of times.
Could you please share your workspace and sample dataset? You definitely won't be doing hundreds of clicks - it is not an option. We need to find out what can be done slightly differently in your workspace set up.

 

I probably should have asked what FME version/build you are using. Have you upgraded to FME 2017 or are you using an older FME?

 

 

Badge
@LenaAtSafe

sample-data.xls

 

 

And this is how I created my workspace:

Generate dynamic workspace

 

In the reader Format Attributes, expose both fme_feature_type and fme_basename

In the writer parameters dialogue, set Feature Class or Table Name to fme_basename and Schema Definition Name to fme_feature_type. Set Geometry to _point

Add VertexCreator transformer and set the X/Y values to Easting/Northing & select mode as replace with point

Set the Rejected Feature Handling to Continue Translation

 

 

I just tried it again using only Sample_Data.xls and the Fe_INAA_ppm field was the only text field that should have been numerical. When I was doing it before, both Fe and Ca_INAA_ppm were coming up as text because of the decimal issue.

 

 

Badge

Hi @badams

this must be a float vs. string issue. I can provide a bit more details if you are interested, but briefly, it is about how numeric values are stored/represented.

What is the destination geodatabase attribute data type? Is it numeric? And how are the source values read? Does the source attribute have data type number? What is its precision (please check Add Reader > Parameters dialog)? Could you please try reading the values with precision 0?

@LenaAtSafeI'm using FME 2017. The destination attribute data type is numeric. The source attribute has the type charautomatically, but I'd like to have the type be number automatically.

 

 

 

Badge

Hi @badams

thank you for the data sample. I will file a PR to investigate why FME reads Fe_INAA_ppm (as well as Ca_INAA_ppm in dynamic mode) as char attribute(s). I guess the problem is caused by the combination of Excel cell formatting (number with decimal places) and original values (integers).

Meanwhile, you could try altering the schema. Please take a look at the attached workspace which alters attribute{} list attribute that describes and defines the schema and then merges the list back onto the features to be used by the GDB Writer:

For more information about schema from attribute please check Dynamic Workflows: Destination Schema is Derived from a Schema Feature .

Badge

Hi @badams

thank you for the data sample. I will file a PR to investigate why FME reads Fe_INAA_ppm (as well as Ca_INAA_ppm in dynamic mode) as char attribute(s). I guess the problem is caused by the combination of Excel cell formatting (number with decimal places) and original values (integers).

Meanwhile, you could try altering the schema. Please take a look at the attached workspace which alters attribute{} list attribute that describes and defines the schema and then merges the list back onto the features to be used by the GDB Writer:

For more information about schema from attribute please check Dynamic Workflows: Destination Schema is Derived from a Schema Feature .

Hi @LenaAtSafe

 

 

Thanks so much! Would you be able to explain how you set up the schema altering transformers? I just gave it (kind of blindly) a try and I am still getting Ca_INAA_ppm and Fe_INAA_ppm as text fields.
Badge
Hi @LenaAtSafe

 

 

Thanks so much! Would you be able to explain how you set up the schema altering transformers? I just gave it (kind of blindly) a try and I am still getting Ca_INAA_ppm and Fe_INAA_ppm as text fields.
Hi @badams

 

 

sorry, I should have mentioned this explicitly: you need to set up the Writer as suggested in the Dynamic Workflow: Destination Schema is Derived from a Schema Feature article. While using schema from feature, the Writer is looking for attribute{} list attribute that defines the schema. The workspace attached above creates the attribute{} attribute with the required schema. Please set up GDB Writer parameters to use the schema from feature and give it another try.

 

Badge

Hi @badams

thank you for the data sample. I will file a PR to investigate why FME reads Fe_INAA_ppm (as well as Ca_INAA_ppm in dynamic mode) as char attribute(s). I guess the problem is caused by the combination of Excel cell formatting (number with decimal places) and original values (integers).

Meanwhile, you could try altering the schema. Please take a look at the attached workspace which alters attribute{} list attribute that describes and defines the schema and then merges the list back onto the features to be used by the GDB Writer:

For more information about schema from attribute please check Dynamic Workflows: Destination Schema is Derived from a Schema Feature .

Hi @LenaAtSafe

 

 

Thank you for the attached workspace. I tried it again and it worked very nicely for the sample data I attached earlier. I'm hoping to be able to run the workspace on hundreds of xls (or xlsx) files at once and I just tried it with 3 to test. The decimal issue is fixed!

 

 

Now, and I'm guessing this has to do with the FeatureMerger transformer, the featureclass headers are all merged and I'd like them to not be. Each featureclass has fields from all 3 input excel files (some are completely <null> because there was no actual column in excel) and I'd like to have them keep the original headers.

 

Alternatively, it might work to merge all the data but have only 1 output featureclass of all the merged data from the 3 input files (eventually hundreds).

 

 

Any advice here?

 

Badge

Hi @badams

thank you for the data sample. I will file a PR to investigate why FME reads Fe_INAA_ppm (as well as Ca_INAA_ppm in dynamic mode) as char attribute(s). I guess the problem is caused by the combination of Excel cell formatting (number with decimal places) and original values (integers).

Meanwhile, you could try altering the schema. Please take a look at the attached workspace which alters attribute{} list attribute that describes and defines the schema and then merges the list back onto the features to be used by the GDB Writer:

For more information about schema from attribute please check Dynamic Workflows: Destination Schema is Derived from a Schema Feature .

@LenaAtSafe also, I'm wondering if there's a way to set the Feature Class or Table Name in the GDB writer to "fme_basename" rather than "fme_feature_type"?

 

When I was running my other workspace, I added a dynamic reader, set fme_basename to be exposed and then I could select in the GDB writer that the table name = fme_basename. This caused the excel file names to be preserved in the output rather than the sheet names.
Badge
Hi @LenaAtSafe

 

 

Thank you for the attached workspace. I tried it again and it worked very nicely for the sample data I attached earlier. I'm hoping to be able to run the workspace on hundreds of xls (or xlsx) files at once and I just tried it with 3 to test. The decimal issue is fixed!

 

 

Now, and I'm guessing this has to do with the FeatureMerger transformer, the featureclass headers are all merged and I'd like them to not be. Each featureclass has fields from all 3 input excel files (some are completely <null> because there was no actual column in excel) and I'd like to have them keep the original headers.

 

Alternatively, it might work to merge all the data but have only 1 output featureclass of all the merged data from the 3 input files (eventually hundreds).

 

 

Any advice here?

 

Hi @badams

 

do you use schema definition from attribute{} attribute? Did you see the extra attributes in the output or on the features before they reach the Writer? You are right, FeatureMerger can add lots of extra attributes to the features, however, with schema defined by attribute{} only required attributes should be written. If you see a different behaviour, please share your current workspace and a small data sample - let's find what is causing the problem.

 

Badge
@LenaAtSafe also, I'm wondering if there's a way to set the Feature Class or Table Name in the GDB writer to "fme_basename" rather than "fme_feature_type"?

 

When I was running my other workspace, I added a dynamic reader, set fme_basename to be exposed and then I could select in the GDB writer that the table name = fme_basename. This caused the excel file names to be preserved in the output rather than the sheet names.

You can use fme_basename attribute value as a feature class name while writing in Dynamic mode as well. Have you tried setting your Writer Feature Class or Table Name parameter as shown above?

 

Badge

You can use fme_basename attribute value as a feature class name while writing in Dynamic mode as well. Have you tried setting your Writer Feature Class or Table Name parameter as shown above?

 

@LenaAtSafe thank you, this worked. I just had to go into the FeatureReader parameters and expose fme_basename

 

Badge
Hi @badams

 

do you use schema definition from attribute{} attribute? Did you see the extra attributes in the output or on the features before they reach the Writer? You are right, FeatureMerger can add lots of extra attributes to the features, however, with schema defined by attribute{} only required attributes should be written. If you see a different behaviour, please share your current workspace and a small data sample - let's find what is causing the problem.

 

@LenaAtSafe I don't see the option to define schema by attribute{}, only attribute{}.name and attribute{}.fme_data_type.

 

 

I'll attach 2 excel files and my .fmw workspace here.

 

 

sample-data-2.xlsx

 

sample-data-1.xlsx

 

workspace.fmw

 

 

Reply