I have CSVs with over 2 million rows. For some reasons, Reader appears to be dropping some columns , and yet I know Reader recognised they existed because they are missing in the column name order. For example, when columns col3 and col6 are dropped the auto-generated attribute names are col0, col0,col1,col2,col4,col5,col7,col8,col9. I have tried various options in reader but clearly not the right one . There is plenty of RAM and disk space. In some files, the missing columns are blank but I have also seen this happen in files that do not have blanks in missing columns. Some columns have strings enclosed in double-quotes. Any pointers please
That's really odd. Would you be able to post a sample file so we can take a closer look? Just the header line and a few lines of data should be enough
That's really odd. Would you be able to post a sample file so we can take a closer look? Just the header line and a few lines of data should be enough
here you go:
23,"I",405612,10008752428,"4620X029950346","3998563000",,"7666VN",2012-02-21,,2016-02-10,2012-12-13
23,"I",405692,37005945,"1725X042609307","10080803000",,"7666VN",2016-01-04,,2016-02-07,2015-12-02
23,"I",405753,10012134708,"1720X016961597","244731175",,"7666VN",2012-06-12,,2016-02-10,2012-12-13
Col6 and Col9 are skipped. I am using 2020.1.0.0 (20200707 - Build 20594 - WIN64)
Thanks.
This occurs if Scan for Types is set to Yes (I'm not sure if 2020 now defaults to this behaviour). It doesn't look like the desired outcome of this setting in any case, as it's not just skipping columns but putting some data in the wrong place and losing other bits
Changing Scan for Types to No reads all the columns correctly.
This occurs if Scan for Types is set to Yes (I'm not sure if 2020 now defaults to this behaviour). It doesn't look like the desired outcome of this setting in any case, as it's not just skipping columns but putting some data in the wrong place and losing other bits
Changing Scan for Types to No reads all the columns correctly.
@fme_superuser This is a known issue and has been fixed for FME 2021.0. It occurs when the CSV file has no headers and there are blank columns. Column numbering gets offset. Issue FMEENGINE-66612
@fme_superuser This is a known issue and has been fixed for FME 2021.0. It occurs when the CSV file has no headers and there are blank columns. Column numbering gets offset. Issue FMEENGINE-66612
Do you know what Scan for Types defaults to in 2020? The sample file above is OS Addressbase data so it might crop up quite a bit
@fme_superuser This is a known issue and has been fixed for FME 2021.0. It occurs when the CSV file has no headers and there are blank columns. Column numbering gets offset. Issue FMEENGINE-66612
@ebygomm @fme_superuser Scan for Types defaults to Yes in 2020. You can change the default under Presets.
The problem only occurs if Scan for Types = Yes.
@fme_superuser @ebygomm We have ported this fix back into an FME 2020.2 patch. It should be in the next FME 2020.2 release that is available on our downloads page (any build after 20804).