Question

Inappropriate handling of non-numeric values in shapefile reader / writer

  • 21 February 2014
  • 3 replies
  • 2 views

Badge
Hi,

 

 

We've stumbled across what seems to be a bug (or perhaps a design decision that we haven't understood!) in the shapefile reader and writer.

 

 

The DBF format defines the type of a field in its header, but there is nothing physically preventing a field that is numeric from having text data stored in it (because internally, numbers are stored as text in the DBF format). Hence, it is up to any implementation of the DBF format to make sure that the correct types of values are written to a field, and to behave sensibly if the wrong types are read from the field.

 

 

FME doesn't seem to do this. We translated some data to shapefiles. The data have a column entitled "Number" which contains road numbers. We set up the shapefile writer to store this data in a numeric field, without properly inspecting all the data. However it turns out that the data actually occasionally contain text values, such as "A3030" (reasonably enough because that's how British roads are numbered).

 

 

I would argue that the expected or correct behaviour here would be for FME to realise that this was non-numeric data, and either throw an error (as it would with a geodatabase data type) or to silently try to parse the data into a number. Even the CSV writer complains in this situation!!

 

 

Instead the data are written as is to the DBF file! So the numeric field ends up containing text. If you look at the DBF in a hex editor you can see "A3030" sitting there.

 

 

If you then open the shapefile in a GIS program you do not see the values as it will look at the header, expect a number, find text, and figure out that there is a problem. However FME, on re-reading the shapefile will happily read "A3030" out of the numeric field and carry on happily. (Until some transformer further down the line, expecting to have got a number out of a numeric field, breaks).

 

 

This reader behaviour is perhaps forgivable as FME being "robust" - as alluded to here:

 

http://fmepedia.safe.com/articles/Error_Unexpected_Behavior/FME-seems-to-misread-the-width-of-Number-fields-when-reading-a-Shape-dataset

 

- but I don't agree, because the user should be able to expect that if the reader has a numeric field, it will output numeric data in that field.

 

 

But I cannot see any justification for the writer behaviour. Could anyone explain whether this is a bug, or whether it is in some sense intended behaviour?

 

 

Harry

 

 


3 replies

Userlevel 4
Hi,

 

 

I just reproduced it using FME 2012 and FME 2014, so my guess would be that it's a design decision rather than a bug. I will leave it to Safe to argue whether this decision makes sense or not, though.

 

 

If you need some sort of field validation for your workflow, there is the AttributeClassifier that might be helpful.

 

 

David
Badge
Thanks David, for taking the time to reproduce this!

 

 

I'm aware of the attribute classifier and it's helpful in a workflow where created number values might accidentally not contain a number (such as "Inf" which can be spat out by the attribute creator sometimes, and not necessarily when it ought to be, but that's another story!).

 

 

However I don't feel it ought to be necessary in a simple workflow such as reading a shapefile and writing it straight back out to a geodatabase with an equivalent schema. A numerical attribute should only contain numbers and therefore should be storable in a numerical field but if the output has this properly enforced by some code outside of FME's control (e.g. geodatabase libraries or RDBMS) then this will break in the situation described.

 

 

So to my mind this is a bug - I'll file a request to Safe to see if they agree.
Userlevel 4
Yes, for what it's worth, I agree with you on this one.

 

 

David

Reply