Solved

Update metadata of a feature class in an ArcGIS Pro File Geodatabase

  • 8 December 2022
  • 8 replies
  • 93 views

Badge

We have some automatic data-updates on an ESRI File Geodatabase. I also want to update some parts of the metadata also automatically as part of the update procedure. I want to change Edition, Publication date, Revision date and dataset ID.

I succeeded getting the right information to insert in the XML updater, but the XML updater doesn't seem to do anything in the metadata. I don't see any result. Could anybody help me out? I added the workspace to this question.

image Addition 9/12/2022:

I just discovered that, if I fill out these fields manually in ArcGIS Pro, that this information is not found in the metadata that is read by FME. This is what's filled out and seen in ArcGIS Pro:

imageI used the mehod with the XML Formatter to see what's in the metadata, and this gives the log attached. You'll see the information shown in ArcGIS Pro, can not be found in the XML read by FME.

imageExporting the metadata out of ArcGIS Pro to XML, contains other (and right) information:

<identificationInfo>

<MD_DataIdentification>

<citation>

<CI_Citation>

<title>

<gco:CharacterString>Bedrijventerreinen OSLO, Vlaanderen - Beheerde bedrijvenzone</gco:CharacterString>

</title>

<date>

<CI_Date>

<date>

<gco:Date>2022-12-09</gco:Date>

</date>

<dateType>

<CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication" codeSpace="ISOTC211/19115">publication</CI_DateTypeCode>

</dateType>

</CI_Date>

</date>

<date>

<CI_Date>

<date>

<gco:Date>2022-12-05</gco:Date>

</date>

<dateType>

<CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="revision" codeSpace="ISOTC211/19115">revision</CI_DateTypeCode>

</dateType>

</CI_Date>

</date>

<edition>

<gco:CharacterString>Toestand 5 december 2022</gco:CharacterString>

</edition>

<editionDate>

<gco:Date>2022-12-06</gco:Date>

</editionDate>

<identifier>

<MD_Identifier>

<code>

<gco:CharacterString>cd379da9-80a5-4fa4-b456-0990f8c83e0a</gco:CharacterString>

</code>

</MD_Identifier>

</identifier>

Any idea how this is possible? Or any tricks to help me out ?

icon

Best answer by boydfme 12 December 2022, 21:25

View original

8 replies

Userlevel 1
Badge +11

Did you have a look at the Step-by-step Instructions in this article: Working with Geodatabase Metadata: Writing/Updating Metadata?

Badge

Did you have a look at the Step-by-step Instructions in this article: Working with Geodatabase Metadata: Writing/Updating Metadata?

Yes, I followed the steps in that article, but nothing is updated in the metadata.

Userlevel 3
Badge +17

Hi @ireen​ 

If the "Value Type" field in XMLUpdater is set as XML/XQuery, an attribute value must be specified using the fme:get-attribute("<attributeName>") syntax in the Value field. If you want to use the @Value(<attributeName>), please switch "Value Type" to Plain Text.

Edit: it may also be related to your update features are structured. Can you try having one update feature with the four attributes containing the update values? For example, one feature routed to the Update port with the attributes Publication, Toestand, Edition, DatasetIdentificator. It may help to have a sample of the input data as well.

Badge

Hi @ireen​ 

If the "Value Type" field in XMLUpdater is set as XML/XQuery, an attribute value must be specified using the fme:get-attribute("<attributeName>") syntax in the Value field. If you want to use the @Value(<attributeName>), please switch "Value Type" to Plain Text.

Edit: it may also be related to your update features are structured. Can you try having one update feature with the four attributes containing the update values? For example, one feature routed to the Update port with the attributes Publication, Toestand, Edition, DatasetIdentificator. It may help to have a sample of the input data as well.

Hi @debbiatsafe​ ,

I first tried that, but that didn't work, so I changed it to the XML/Xquery Value Type. I set it back to Plain Text now, but still the same result (= nothing updated).

I just discovered that, if I fill out these fields manually in ArcGIS Pro, that this information is not found in the metadata that is read by FME. This is what's filled out and seen in ArcGIS Pro:

imageI used the mehod with the XML Formatter to see what's in the metadata, and this gives the log attached. You'll see the information shown in ArcGIS Pro, can not be found in the XML read by FME.

imageAny idea how this is possible? Or any tricks to help me out?

Badge

For a client project I worked on, they had updates coming into a FGDB via a FME ETL process.... we found it was easier, faster, and better practice to utilize the PRO metadata functions along with "xpaths" to perform metadata xml updates. It's something to do with the xml edits missing the "save" action in FME that made us go this direction (or maybe we were just missing something....not sure). Also, xpath edits are extremely performance efficient. 

We just compiled this in a python-shutdown python script in the FME workbench that did the updates to the FGDB. That way, the workbench would finish first....and then the metadata updates would happen after the FGDB ETL process finished - successfully. Everything stayed "in-sync".

 

What we did was, we used this python function to "update (if needed), sync the metadata, and then save out the feature metadata" as an XML to a scratch location. This is then used in the update process further below:

def update_sync_saveOut(source, export_location):
    '''client needed an automated way to upgrade, synchronize,
    and saveAs - All Content for each fc and dataset in their FGBD.
    This function serves the basis of this automation and follows the ESRI 
    step-by-step process to perform these actions in ArcGIS PRO. Takes a source metadata
    location and then a xml file export location. 
    Uses: src_item_md.saveAsXML(export_location, EXACT_COPY)
    '''
    text = ''
    text +='##################\n'
    text += str(source)+'\n'
    text += str(export_location)+'\n'
    #
    #Metadata
    src_item_md = md.Metadata(source)
    try:
        #Upgrade; free free to omit if you don't need this step or don't use FGDC format
        src_item_md.upgrade('FGDC_CSDGM')
        text += 'Upgraded metadata for {}\n'.format(source)
    except:
        text += 'Upgrade not needed for {}\n'.format(source)
    try:
        #synchronize
        src_item_md.synchronize('ALWAYS')
        text += 'Synchronized\n'
        # Save 
        src_item_md.saveAsXML(export_location, 'EXACT_COPY')
        text += 'Saved out successfully.\n'
    except:
        text +=arcpy.GetMessages(0)
    #
    text +=arcpy.GetMessages(0)+'\n'
    return text

After the metadata from the xml has been copied to a location, you can then use it and modify it by using xpaths as such (feel free to modify to your needs; we only update the date fields in ours):

import arcpy
from arcpy import metadata as md
import datetime
#
from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
text = '' # compile all messages along the way
 
src_xml = "path to your feature class xml that was saved out from above function" # raw xml
scratch = "xml path" # location to copy your raw xml with removed info to perform edits and saves
fet_path = "output xml path" #feature class path the edited/updated metadata xml will go to
 
#Save a filtered copy metadata to scratch; 'REMOVE_ALL_SENSITIVE_INFO'
try:
    src_item_md = md.Metadata(src_xml)
    src_item_md.saveAsXML(scratch, 'REMOVE_ALL_SENSITIVE_INFO') # removes unnessary geoprocessing history etc.
    text += 'Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)'+'\n'
except:
    text +=arcpy.GetMessages(0)
#
tree = ET.parse(scratch)
#
# your xpaths
dexpaths = ["//idCitation//pubDate", "//idCitation//resEdDate", "//idCitation//reviseDate", "//idCitation//identCode"]
 
 
for de in dexpaths:
    try:
        date = tree.xpath(de+'//{}'.format('text()'))[0]
        for elem in tree.xpath(de):
            elem.text = datetime.datetime.now().strftime('20%y%m%d')
            text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'
    except:
        text += 'Does not have element:  {}//text(). Passing...\n'.format(de)
#Save date edits to xml
tree.write(scratch)
#
#Import the metadata into FGDB (ArcPro)
src_template_md = md.Metadata(scratch)
tgt_item_md = md.Metadata(fet_path)
tgt_item_md.copy(src_template_md)
tgt_item_md.save()
#
text +=arcpy.GetMessages(0)+'\n'

*Basic message feedback of output and successful modify+import to feature class:

Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)

//idCitation//pubDate Changed from: 20220526 to: 20221202

//idCitation//reviseDate Changed from: 20220526 to: 20221202

//idCitation//resEdDate Changed from: 20220526 to: 20221202

Does not have element: //idCitation//identCode//text(). Passing...

Start Time: Friday, December 2, 2022 9:41:00 AM

Succeeded at Friday, December 2, 2022 9:41:00 AM (Elapsed Time: 0.08 seconds)

Badge

Thank you for sharing your approach of this issue. I try to do it this way. Thanks so much!

Badge

For a client project I worked on, they had updates coming into a FGDB via a FME ETL process.... we found it was easier, faster, and better practice to utilize the PRO metadata functions along with "xpaths" to perform metadata xml updates. It's something to do with the xml edits missing the "save" action in FME that made us go this direction (or maybe we were just missing something....not sure). Also, xpath edits are extremely performance efficient. 

We just compiled this in a python-shutdown python script in the FME workbench that did the updates to the FGDB. That way, the workbench would finish first....and then the metadata updates would happen after the FGDB ETL process finished - successfully. Everything stayed "in-sync".

 

What we did was, we used this python function to "update (if needed), sync the metadata, and then save out the feature metadata" as an XML to a scratch location. This is then used in the update process further below:

def update_sync_saveOut(source, export_location):
    '''client needed an automated way to upgrade, synchronize,
    and saveAs - All Content for each fc and dataset in their FGBD.
    This function serves the basis of this automation and follows the ESRI 
    step-by-step process to perform these actions in ArcGIS PRO. Takes a source metadata
    location and then a xml file export location. 
    Uses: src_item_md.saveAsXML(export_location, EXACT_COPY)
    '''
    text = ''
    text +='##################\n'
    text += str(source)+'\n'
    text += str(export_location)+'\n'
    #
    #Metadata
    src_item_md = md.Metadata(source)
    try:
        #Upgrade; free free to omit if you don't need this step or don't use FGDC format
        src_item_md.upgrade('FGDC_CSDGM')
        text += 'Upgraded metadata for {}\n'.format(source)
    except:
        text += 'Upgrade not needed for {}\n'.format(source)
    try:
        #synchronize
        src_item_md.synchronize('ALWAYS')
        text += 'Synchronized\n'
        # Save 
        src_item_md.saveAsXML(export_location, 'EXACT_COPY')
        text += 'Saved out successfully.\n'
    except:
        text +=arcpy.GetMessages(0)
    #
    text +=arcpy.GetMessages(0)+'\n'
    return text

After the metadata from the xml has been copied to a location, you can then use it and modify it by using xpaths as such (feel free to modify to your needs; we only update the date fields in ours):

import arcpy
from arcpy import metadata as md
import datetime
#
from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
text = '' # compile all messages along the way
 
src_xml = "path to your feature class xml that was saved out from above function" # raw xml
scratch = "xml path" # location to copy your raw xml with removed info to perform edits and saves
fet_path = "output xml path" #feature class path the edited/updated metadata xml will go to
 
#Save a filtered copy metadata to scratch; 'REMOVE_ALL_SENSITIVE_INFO'
try:
    src_item_md = md.Metadata(src_xml)
    src_item_md.saveAsXML(scratch, 'REMOVE_ALL_SENSITIVE_INFO') # removes unnessary geoprocessing history etc.
    text += 'Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)'+'\n'
except:
    text +=arcpy.GetMessages(0)
#
tree = ET.parse(scratch)
#
# your xpaths
dexpaths = ["//idCitation//pubDate", "//idCitation//resEdDate", "//idCitation//reviseDate", "//idCitation//identCode"]
 
 
for de in dexpaths:
    try:
        date = tree.xpath(de+'//{}'.format('text()'))[0]
        for elem in tree.xpath(de):
            elem.text = datetime.datetime.now().strftime('20%y%m%d')
            text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'
    except:
        text += 'Does not have element:  {}//text(). Passing...\n'.format(de)
#Save date edits to xml
tree.write(scratch)
#
#Import the metadata into FGDB (ArcPro)
src_template_md = md.Metadata(scratch)
tgt_item_md = md.Metadata(fet_path)
tgt_item_md.copy(src_template_md)
tgt_item_md.save()
#
text +=arcpy.GetMessages(0)+'\n'

*Basic message feedback of output and successful modify+import to feature class:

Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)

//idCitation//pubDate Changed from: 20220526 to: 20221202

//idCitation//reviseDate Changed from: 20220526 to: 20221202

//idCitation//resEdDate Changed from: 20220526 to: 20221202

Does not have element: //idCitation//identCode//text(). Passing...

Start Time: Friday, December 2, 2022 9:41:00 AM

Succeeded at Friday, December 2, 2022 9:41:00 AM (Elapsed Time: 0.08 seconds)

How do I find the xpaths in the XML? In the XML-tree I have several look-a-like-paths for different date types (creation / publication / revision): 

imageSo, //idCitation//date is not clear ...  How do I find the correct xpaths? 

Badge

XPath uses path expressions to select nodes or node-sets in an XML document which you can then use to find or modify text or elements. So some understanding of XML structure and xpath syntax might be helpful. 

Example 1:

So say I had this section of an metadata xml and I wanted to pull out the ModDate text (the highlighted part):

imageThe xpath would be: "//Esri//ModDate" (this would retrieve the element object). To actually pull the date text out (20220429), you would add the xpath command: text(). So the xpath would now be: "//Esri//ModDate//text()" which would give us the date text in a list and if we were to print it out, you would get ['20220429']. You can then use python list indexing to then pull the actual date value out like in my previous example by adding [0].

So what is this short xpath syntax saying in laymen terms? Find all root instances of Esri and then find the child element ModDate and then strip out the date element text inside ModDate.

Example 2:

Now lets look at a more nested example where there is more than one of the same root element. Say we wanted to add in a date or modify the section of text in the tmPosition (the 'unknown' text; underlined in blue) element:

imageFollowing the xml structure here, We would use the xpath: '//dataExt//tempEle//exTemp//TM_Instant//tmPosition//text()' which would give use a list of all the tmPosition text elements. In this example, we would get a list of ['Unknown', 'Unknown']. Now that we confirmed we got our xpath right, you could then update both 'Unknowns' by doing something like this:

de = '//dataExt//tempEle//exTemp//TM_Instant//tmPosition'
for elem in tree.xpath(de):
    elem.text = datetime.datetime.now().strftime('20%y-%m-%d')
    text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'

If you wanted to update ....say...the last 'Unknown' element only and leave the first one alone, you could use list indexing to only look at the last occurrence:

de = '//dataExt//tempEle//exTemp//TM_Instant//tmPosition'
for elem in [tree.xpath(de)[-1]]:
    elem.text = datetime.datetime.now().strftime('20%y-%m-%d')

Your example:

Looking at your screenshot, it looks like the xpaths to get the "gco:Date" would be something like this:

'//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date'

 

and 

 

'//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date//text()'

 

You can play around and confirm by using this code snippet (just be sure to include your path to the xml).

from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
md_xml = r"<your path to metadata xml here>"
tree = ET.parse(md_xml)
    
de = '//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date'
 
de_text = '//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date//text()'
 
text = tree.xpath(de_text)
text1 = tree.xpath(de_text)[0]
print (text)
print (text1)

Theoretically, if I got my formatting right, it should give you a list of all instances of your highlighted parts you are looking for. My thought process using your xml example:

imageThere is a lot of different ways to approach this but hopefully, this helps!

Reply