Question

short datatypes understood as long with new Shapefile reader (FME2016)


My shapefiles have attributes coded as short (integer with 6 digits) and FME2016 read them as long, so when I output them in a dynamic workflow, their width is 10.

- If I manually force to short, they are 5 digits long, which is not OK...

- If I manually force to number with 6 digits and precision 0, it is OK, but I can't do it dynamically with datasets of many shapefiles...

 

This is since FME2016 supports the short/long etc. datatypes for shapefiles. With FME2015, my data was understood correctly as number with 6 digits, in dynamic workflows.

Is it a bug or can I automatically force all such attributes to numbers with 6 digits ?


21 replies

Userlevel 2
Badge +17

Hi @johann, I'm surprised to know that the 'number(n,p)' style numeric type definition cannot be read by the Shapefile reader in FME 2016. If every 'long' type should be changed to 'number(6,0)' type, I would read the source Shapefile files with the FeatureReader and modify schema definitions with a script. e.g.

0684Q00000ArKXKQA3.png

# PythonCaller Script Example
# Change 'long' to 'number(6,0)' for a schema definition.
def processFeature(feature):
    types = feature.getAttribute('attribute{}.native_data_type')
    for i, t in enumerate(types):
        if t == 'long':
            feature.setAttribute('attribute{%d}.native_data_type' % i, 'number(6,0)')
            feature.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_decimal(6,0)')

I think that it would be better if the Shapefile reader could have an option to read the data type as the 'number(n,p)' style to keep compatibility with traditional schema definitions.

Userlevel 4

Sounds like there might be a bug in the mapping file, somebody from Safe should chime in here (@daleatsafe). The mapping file changed quite a bit from 2015 to 2016.

To me it looks a bit weird that fme_int16 maps to short, while fme_uint16 maps to long:

               short                   fme_int16                  \
               short                   fme_int8                   \
               short                   fme_uint8                  \
               long                    fme_int32                  \
               long                    fme_uint16                 \

The mapping file in question is <FME>\metafile\dbftypes_esrishape.fmi

Userlevel 4
Badge +25
I'll ask a developer to check this out. But I wonder if it's a difference between signed and unsigned? A short integer is 16-bits, so zero (0) to 65535 (unsigned) or ?32,768 to +32,767 (signed). The latter format uses 6 digits and maybe that's what Esri uses. I'll get back to you.

 

Userlevel 4
Badge +25

I've filed this with out developers to look at (PR#71956)

However, I would point you to this Esri info page.

It says that "By default in ArcGIS for Desktop, short integers are created with a precision of 5." - so that's probably why we went for that value. But I guess it doesn't mean you can't have a short with a precision of 6. We'll see what our development team say.

Userlevel 2
Badge +17

I've filed this with out developers to look at (PR#71956)

However, I would point you to this Esri info page.

It says that "By default in ArcGIS for Desktop, short integers are created with a precision of 5." - so that's probably why we went for that value. But I guess it doesn't mean you can't have a short with a precision of 6. We'll see what our development team say.

I don't think the Esri's description can be applied to the Shapefile format. In my understanding, 'short' and 'long' for the Shapefile format are just aliases of 'number(5,0)' and 'number(10,0)' in the DBF format. Therefore, the 'short' cannot store the full range of signed 16 bit integer (the minimum: –32768 requires 6 digits including the negative sign). As well, the 'long' cannot store the full range of a signed 32 bit integer (the minimum: –2147483648 requires 11 digits)

 

In my personal opinion, the 'short' and 'long' in the Shapefile format should be mapped to 'fme_decimal(w,p)' like this, rather than fme_int*/fme_uint*.

 

short    fme_decimal(5,0)
long    fme_decimal(10,0)
and 'fme_int*' and 'fme_uint*' should be mapped to 'number(w,0)' which have enough width equal to or greater than the maximum number of digits including the negative sign for the integer types. e.g.

 

number(6,0)    fme_int16
number(6,0)    fme_uint16
number(11,0)    fme_int32
number(11,0)    fme_uint32
Userlevel 4
I don't think the Esri's description can be applied to the Shapefile format. In my understanding, 'short' and 'long' for the Shapefile format are just aliases of 'number(5,0)' and 'number(10,0)' in the DBF format. Therefore, the 'short' cannot store the full range of signed 16 bit integer (the minimum: –32768 requires 6 digits including the negative sign). As well, the 'long' cannot store the full range of a signed 32 bit integer (the minimum: –2147483648 requires 11 digits)

 

In my personal opinion, the 'short' and 'long' in the Shapefile format should be mapped to 'fme_decimal(w,p)' like this, rather than fme_int*/fme_uint*.

 

short    fme_decimal(5,0)
long    fme_decimal(10,0)
and 'fme_int*' and 'fme_uint*' should be mapped to 'number(w,0)' which have enough width equal to or greater than the maximum number of digits including the negative sign for the integer types. e.g.

 

number(6,0)    fme_int16
number(6,0)    fme_uint16
number(11,0)    fme_int32
number(11,0)    fme_uint32
I agree. Interestingly, if I create a new shape file using ArcCatalog 10.2.2 I can create a Short Integer field with only up to 4 digits (precision), after that it gets automatically converted to a Long Integer with the specified precision. Examples:

 

ArcCatalog data typeDBF data typeLong integer, precision set to 6number(6,0)Long integer, precision not specifiednumber(9,0)Short integer, precision not specifiednumber(4,0)Double, precision set to 16,5number(17,5)Even ESRI seems a little bit confused...

 

 

Userlevel 4
I agree. Interestingly, if I create a new shape file using ArcCatalog 10.2.2 I can create a Short Integer field with only up to 4 digits (precision), after that it gets automatically converted to a Long Integer with the specified precision. Examples:

 

ArcCatalog data typeDBF data typeLong integer, precision set to 6number(6,0)Long integer, precision not specifiednumber(9,0)Short integer, precision not specifiednumber(4,0)Double, precision set to 16,5number(17,5)Even ESRI seems a little bit confused...

 

 

I just found this page with some interesting info (it's for version 9.3 but the shape specification hasn't changed in a loooong time):

 

How data converts when importing

 

 

In short, numbers of length 1-4 are considered Short Integers, numbers of length 5-9 are considered Long Integers.

 

 

This confirms my findings in ArcCatalog 10.2.2 as well. So when ESRI says a Short Integer should cover -32,768 to 32,767 that does not seem to be true for shape files, it will only cover the equivalent of fme_int8 (-128 to 127) or fme_uint8 (0 to 255), but not fme_int16. So I believe the mapping file needs to be further looked into.
Userlevel 2
Badge +17
I agree. Interestingly, if I create a new shape file using ArcCatalog 10.2.2 I can create a Short Integer field with only up to 4 digits (precision), after that it gets automatically converted to a Long Integer with the specified precision. Examples:

 

ArcCatalog data typeDBF data typeLong integer, precision set to 6number(6,0)Long integer, precision not specifiednumber(9,0)Short integer, precision not specifiednumber(4,0)Double, precision set to 16,5number(17,5)Even ESRI seems a little bit confused...

 

 

In my test with ArcCatalog 10.3.1, the result was bit different.ArcCatalog 10.3.1 data typeDBF data typeLong Integer, precision set to 6number(6,0)Long Integer, precision not specifiednumber(10,0)Short Integer, precision not specifiednumber(5,0)Double, precision set to 16,5number(17,5)Anyway, I don't think it's good that the pseudo 'short' and 'long' for Shapefile are mapped to fme_int# types.

 

Userlevel 4
I just found this page with some interesting info (it's for version 9.3 but the shape specification hasn't changed in a loooong time):

 

How data converts when importing

 

 

In short, numbers of length 1-4 are considered Short Integers, numbers of length 5-9 are considered Long Integers.

 

 

This confirms my findings in ArcCatalog 10.2.2 as well. So when ESRI says a Short Integer should cover -32,768 to 32,767 that does not seem to be true for shape files, it will only cover the equivalent of fme_int8 (-128 to 127) or fme_uint8 (0 to 255), but not fme_int16. So I believe the mapping file needs to be further looked into.
That's very interesting, @takashi.

 

 

If I create a shape file using ArcGIS 10.2.2 it looks like this:

 

A: Long integer -> number(9,0)

 

B: Short integer -> number(4,0)

 

 

If I use a dynamic workspace to translate the shape file, the resulting shape file has:

 

A: number(10,0)

 

B: number(5,0)

 

So FME seems to follow ArcGIS 10.3.1 conventions here, rather than preserving the exact precision.

 

 

If I open it in ArcCatalog 10.2.2 it reports:

 

A: Long integer

 

B: Long integer

 

 

If I open the same shape file in ArcCatalog 10.3.1 I get:

 

A: Long integer

 

B: Short integer

 

 

So I'm wondering if it is the changes in the integer data types between ArcGIS 10.2 and 10.3 that creates this issue.

 

Userlevel 2
Badge +17
I agree. Interestingly, if I create a new shape file using ArcCatalog 10.2.2 I can create a Short Integer field with only up to 4 digits (precision), after that it gets automatically converted to a Long Integer with the specified precision. Examples:

 

ArcCatalog data typeDBF data typeLong integer, precision set to 6number(6,0)Long integer, precision not specifiednumber(9,0)Short integer, precision not specifiednumber(4,0)Double, precision set to 16,5number(17,5)Even ESRI seems a little bit confused...

 

 

Really interesting. I also guess that Safe intended to follow the default interpretation of ArcGIS 10.3.1.

 

This workflow (FME 2016.1.1) maybe describes why the 'short' should not be mapped to either 'fme_int16' or 'fme_uint16'. The value of A (-32768) cannot be written into the destination Shapefile even though it is within the range of 'fme_int16'. The value of D (99999) can be written though it is out of range of 'fme_uint16'.

 

Userlevel 4
Badge +25

I've filed this with out developers to look at (PR#71956)

However, I would point you to this Esri info page.

It says that "By default in ArcGIS for Desktop, short integers are created with a precision of 5." - so that's probably why we went for that value. But I guess it doesn't mean you can't have a short with a precision of 6. We'll see what our development team say.

We'll have to see. I don't know whether it's a mapping fix or not. Remember, there is no option to set precision in the Shape feature type. If you set it to short you get 5 digits, set it to long and you get 10. Maybe we just need to change the writer to use 6 digits for a short, not 5.

 

 

Whether ArcGIS would recognize it as a short is another matter! I'm pretty sure it would recognize that as a long. How the OP got a 6-digit field recognized as a short by ArcGIS is interesting! I'm thinking perhaps it wasn't created by ArcGIS?

 

 

The mapping does seem odd to me though. I'd have thought int16 should map to long (needing a sign) but uint16 should be a short. But I am probably missing something.

 

 

Also, it's interesting that ArcGIS allows you to set precision for a short/long. For example, create a Shapefile in ArcGIS and you can create a Long field set to precision 7. But round-trip through FME and the output will have a precision of 10. I don't know of any user who has complained about that, but it's an interesting case. It's another reason I think this isn't a mapping fix.

 

 

Anyway, I asked Dale to take a look (as well as filing a PR) so we shall see what happens.

 

Userlevel 4
We'll have to see. I don't know whether it's a mapping fix or not. Remember, there is no option to set precision in the Shape feature type. If you set it to short you get 5 digits, set it to long and you get 10. Maybe we just need to change the writer to use 6 digits for a short, not 5.

 

 

Whether ArcGIS would recognize it as a short is another matter! I'm pretty sure it would recognize that as a long. How the OP got a 6-digit field recognized as a short by ArcGIS is interesting! I'm thinking perhaps it wasn't created by ArcGIS?

 

 

The mapping does seem odd to me though. I'd have thought int16 should map to long (needing a sign) but uint16 should be a short. But I am probably missing something.

 

 

Also, it's interesting that ArcGIS allows you to set precision for a short/long. For example, create a Shapefile in ArcGIS and you can create a Long field set to precision 7. But round-trip through FME and the output will have a precision of 10. I don't know of any user who has complained about that, but it's an interesting case. It's another reason I think this isn't a mapping fix.

 

 

Anyway, I asked Dale to take a look (as well as filing a PR) so we shall see what happens.

 

It's a very interesting subject, for sure. I also noticed that you could set precision in ArcCatalog.

 

Because of this I think FME should just preserve whatever precision was set in the shape file and not try and parse it back and forth into long/short integer, it is bound to lead to errors or misunderstandings.

 

Userlevel 4
Badge +13

Wow, great feedback. Reads almost like a mystery novel. Shape files are truly exciting, even after all these years.

Seems like we should make some minor tweaks. Feels like our reader should not ever report the 'pseudo-types' and instead always tell the number(x,y) that was in the DBF. And that the writer should either allow an optional precision on the short/long OR alter our idea of what short is so we can store an int16 OR believe in ArcGIS's idea of short and NOT map to it from our own uint types.

 

Great findings, thanks all. I'll pass this this on to the team.
Userlevel 2
Badge +17
We'll have to see. I don't know whether it's a mapping fix or not. Remember, there is no option to set precision in the Shape feature type. If you set it to short you get 5 digits, set it to long and you get 10. Maybe we just need to change the writer to use 6 digits for a short, not 5.

 

 

Whether ArcGIS would recognize it as a short is another matter! I'm pretty sure it would recognize that as a long. How the OP got a 6-digit field recognized as a short by ArcGIS is interesting! I'm thinking perhaps it wasn't created by ArcGIS?

 

 

The mapping does seem odd to me though. I'd have thought int16 should map to long (needing a sign) but uint16 should be a short. But I am probably missing something.

 

 

Also, it's interesting that ArcGIS allows you to set precision for a short/long. For example, create a Shapefile in ArcGIS and you can create a Long field set to precision 7. But round-trip through FME and the output will have a precision of 10. I don't know of any user who has complained about that, but it's an interesting case. It's another reason I think this isn't a mapping fix.

 

 

Anyway, I asked Dale to take a look (as well as filing a PR) so we shall see what happens.

 

We can set 'number(w,p)' type in the Shapefile writer, so it may not be necessary to fix the writer. I think the reader should be improved to read numeric types with 'number(w,p)' style as-is. Honestly I don't need the pseudo 'short', 'long', 'float', 'double' for Shapefile format. Those just mislead users.

 

Userlevel 4
Badge +13

In case you're interested in even more... this article is well worth the read. In particular, this part (which is talking about the variation in versions of ArcGIS:

What you are seeing is a change to the behavior of dBase/Shapefiles introduced at 10.3.1. Normally a Short Integer can hold numbers from -32767 to +32767, but dBase doesn't store a Short. It stores an N type. Prior to 10.3.1 a Short was defined as a N 4. An N 4 can hold numbers from -999 to 9999. Not the full range of a Short. At 10.3.1 the Short definition (or our mapping of it) was changed to N 5 to allow a range of -9999 to +32767. In your case this changed the definition of a N 5, previously mapped as a Long, to a Short. This was change to fix a problem with silent truncation of -9999 to -999 in Short integer fields and to update the range allowed to be closer to what is normally allowed in a Short. This had some unintended consequences that we will be correcting in 10.4. .

Userlevel 2
Badge +17

I noticed the pseudo "short" and "long" still appear in the Shapefile reader feature type in FME 2017.

I do not think it's good that the Shapefile reader converts the native "number(n,0)" type to the pseudo "short" or "long" implicitly, as discussed in this thread, last summer.

How do you think about this? @daleatsafe, @Mark2AtSafe, @david_r

Userlevel 2
Badge +17

I noticed the pseudo "short" and "long" still appear in the Shapefile reader feature type in FME 2017.

I do not think it's good that the Shapefile reader converts the native "number(n,0)" type to the pseudo "short" or "long" implicitly, as discussed in this thread, last summer.

How do you think about this? @daleatsafe, @Mark2AtSafe, @david_r

I'd like to see native "number(n,o)" as-is in the reader feature type, rather than "short" and "long".

 

 

 

Userlevel 4
Badge +13

I noticed the pseudo "short" and "long" still appear in the Shapefile reader feature type in FME 2017.

I do not think it's good that the Shapefile reader converts the native "number(n,0)" type to the pseudo "short" or "long" implicitly, as discussed in this thread, last summer.

How do you think about this? @daleatsafe, @Mark2AtSafe, @david_r

Hi Takashi -- seems like we should give this another look comparing against ArcGIS again. We were trying to mimic/give back what ArcGIS does -- that was the goal. But it sounds like it might be better to be less clever on the reading side. We can leave the psuedo types on writing. But you wouldn't get them back if you read them with FME later then.

 

 

Userlevel 2
Badge +17

Sadly I found that the pseudo "float" and "double" too appeared in the reader feature type in FME 2017.1, and also there was an inconsistency between writer and reader. Worse than before.

The major issues caused by the pseudo data types are:

  • With dynamic workflow whose source and destination formats both are Esri Shapefile, some data types would be changed implicitly (e.g. number(2,0) -> short: number(5,0)).
  • A value of number(5,0) type (-9,999 - 99,999) could overflow valid range of the "short" type (fme_int16: -32,768 - 32,767).
  • A value of number(10,0) type (-999,999,999 - 9,999,999,999) could overflow valid range of the "long" type (fme_int32: -2,147,483,648 - 2,147,483,647).

In my personal view, the pseudo data types should be retired at least in the reader schema. Only in the writer feature type (User Attributes tab), the pseudo type names might be useful as aliases of certain "number(w,p)".

Userlevel 2
Badge +17

Sadly I found that the pseudo "float" and "double" too appeared in the reader feature type in FME 2017.1, and also there was an inconsistency between writer and reader. Worse than before.

The major issues caused by the pseudo data types are:

  • With dynamic workflow whose source and destination formats both are Esri Shapefile, some data types would be changed implicitly (e.g. number(2,0) -> short: number(5,0)).
  • A value of number(5,0) type (-9,999 - 99,999) could overflow valid range of the "short" type (fme_int16: -32,768 - 32,767).
  • A value of number(10,0) type (-999,999,999 - 9,999,999,999) could overflow valid range of the "long" type (fme_int32: -2,147,483,648 - 2,147,483,647).

In my personal view, the pseudo data types should be retired at least in the reader schema. Only in the writer feature type (User Attributes tab), the pseudo type names might be useful as aliases of certain "number(w,p)".

Looks good. FME 2015.1.3.2

 

 

Sadly I found that the pseudo "float" and "double" too appeared in the reader feature type in FME 2017.1, and also there was an inconsistency between writer and reader. Worse than before.

The major issues caused by the pseudo data types are:

  • With dynamic workflow whose source and destination formats both are Esri Shapefile, some data types would be changed implicitly (e.g. number(2,0) -> short: number(5,0)).
  • A value of number(5,0) type (-9,999 - 99,999) could overflow valid range of the "short" type (fme_int16: -32,768 - 32,767).
  • A value of number(10,0) type (-999,999,999 - 9,999,999,999) could overflow valid range of the "long" type (fme_int32: -2,147,483,648 - 2,147,483,647).

In my personal view, the pseudo data types should be retired at least in the reader schema. Only in the writer feature type (User Attributes tab), the pseudo type names might be useful as aliases of certain "number(w,p)".

 

I agree with you Takashi !

 

Reply