Solved

Update metadata of a feature class in an ArcGIS Pro File Geodatabase

Forum|Forum|3 years ago
December 8, 2022
8 replies
451 views

ireen
Contributor

We have some automatic data-updates on an ESRI File Geodatabase. I also want to update some parts of the metadata also automatically as part of the update procedure. I want to change Edition, Publication date, Revision date and dataset ID.

I succeeded getting the right information to insert in the XML updater, but the XML updater doesn't seem to do anything in the metadata. I don't see any result. Could anybody help me out? I added the workspace to this question.

Addition 9/12/2022:

I just discovered that, if I fill out these fields manually in ArcGIS Pro, that this information is not found in the metadata that is read by FME. This is what's filled out and seen in ArcGIS Pro:

I used the mehod with the XML Formatter to see what's in the metadata, and this gives the log attached. You'll see the information shown in ArcGIS Pro, can not be found in the XML read by FME.

Exporting the metadata out of ArcGIS Pro to XML, contains other (and right) information:

<identificationInfo>

<MD_DataIdentification>

<citation>

<CI_Citation>

<title>

<gco:CharacterString>Bedrijventerreinen OSLO, Vlaanderen - Beheerde bedrijvenzone</gco:CharacterString>

</title>

<date>

<CI_Date>

<date>

<gco:Date>2022-12-09</gco:Date>

</date>

<dateType>

<CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication" codeSpace="ISOTC211/19115">publication</CI_DateTypeCode>

</dateType>

</CI_Date>

</date>

<date>

<CI_Date>

<date>

<gco:Date>2022-12-05</gco:Date>

</date>

<dateType>

<CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="revision" codeSpace="ISOTC211/19115">revision</CI_DateTypeCode>

</dateType>

</CI_Date>

</date>

<edition>

<gco:CharacterString>Toestand 5 december 2022</gco:CharacterString>

</edition>

<editionDate>

<gco:Date>2022-12-06</gco:Date>

</editionDate>

<identifier>

<MD_Identifier>

<code>

<gco:CharacterString>cd379da9-80a5-4fa4-b456-0990f8c83e0a</gco:CharacterString>

</code>

</MD_Identifier>

</identifier>

Any idea how this is possible? Or any tricks to help me out ?

Best answer by boydfme

For a client project I worked on, they had updates coming into a FGDB via a FME ETL process.... we found it was easier, faster, and better practice to utilize the PRO metadata functions along with "xpaths" to perform metadata xml updates. It's something to do with the xml edits missing the "save" action in FME that made us go this direction (or maybe we were just missing something....not sure). Also, xpath edits are extremely performance efficient.

We just compiled this in a python-shutdown python script in the FME workbench that did the updates to the FGDB. That way, the workbench would finish first....and then the metadata updates would happen after the FGDB ETL process finished - successfully. Everything stayed "in-sync".

What we did was, we used this python function to "update (if needed), sync the metadata, and then save out the feature metadata" as an XML to a scratch location. This is then used in the update process further below:

def update_sync_saveOut(source, export_location):
    '''client needed an automated way to upgrade, synchronize,
    and saveAs - All Content for each fc and dataset in their FGBD.
    This function serves the basis of this automation and follows the ESRI 
    step-by-step process to perform these actions in ArcGIS PRO. Takes a source metadata
    location and then a xml file export location. 
    Uses: src_item_md.saveAsXML(export_location, EXACT_COPY)
    '''
    text = ''
    text +='##################\n'
    text += str(source)+'\n'
    text += str(export_location)+'\n'
    #
    #Metadata
    src_item_md = md.Metadata(source)
    try:
        #Upgrade; free free to omit if you don't need this step or don't use FGDC format
        src_item_md.upgrade('FGDC_CSDGM')
        text += 'Upgraded metadata for {}\n'.format(source)
    except:
        text += 'Upgrade not needed for {}\n'.format(source)
    try:
        #synchronize
        src_item_md.synchronize('ALWAYS')
        text += 'Synchronized\n'
        # Save 
        src_item_md.saveAsXML(export_location, 'EXACT_COPY')
        text += 'Saved out successfully.\n'
    except:
        text +=arcpy.GetMessages(0)
    #
    text +=arcpy.GetMessages(0)+'\n'
    return text

After the metadata from the xml has been copied to a location, you can then use it and modify it by using xpaths as such (feel free to modify to your needs; we only update the date fields in ours):

import arcpy
from arcpy import metadata as md
import datetime
#
from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
text = '' # compile all messages along the way
 
src_xml = "path to your feature class xml that was saved out from above function" # raw xml
scratch = "xml path" # location to copy your raw xml with removed info to perform edits and saves
fet_path = "output xml path" #feature class path the edited/updated metadata xml will go to
 
#Save a filtered copy metadata to scratch; 'REMOVE_ALL_SENSITIVE_INFO'
try:
    src_item_md = md.Metadata(src_xml)
    src_item_md.saveAsXML(scratch, 'REMOVE_ALL_SENSITIVE_INFO') # removes unnessary geoprocessing history etc.
    text += 'Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)'+'\n'
except:
    text +=arcpy.GetMessages(0)
#
tree = ET.parse(scratch)
#
# your xpaths
dexpaths = ["//idCitation//pubDate", "//idCitation//resEdDate", "//idCitation//reviseDate", "//idCitation//identCode"]
 
 
for de in dexpaths:
    try:
        date = tree.xpath(de+'//{}'.format('text()'))[0]
        for elem in tree.xpath(de):
            elem.text = datetime.datetime.now().strftime('20%y%m%d')
            text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'
    except:
        text += 'Does not have element:  {}//text(). Passing...\n'.format(de)
#Save date edits to xml
tree.write(scratch)
#
#Import the metadata into FGDB (ArcPro)
src_template_md = md.Metadata(scratch)
tgt_item_md = md.Metadata(fet_path)
tgt_item_md.copy(src_template_md)
tgt_item_md.save()
#
text +=arcpy.GetMessages(0)+'\n'

*Basic message feedback of output and successful modify+import to feature class:

Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)

//idCitation//pubDate Changed from: 20220526 to: 20221202

//idCitation//reviseDate Changed from: 20220526 to: 20221202

//idCitation//resEdDate Changed from: 20220526 to: 20221202

Does not have element: //idCitation//identCode//text(). Passing...

Start Time: Friday, December 2, 2022 9:41:00 AM

Succeeded at Friday, December 2, 2022 9:41:00 AM (Elapsed Time: 0.08 seconds)

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

+14

egge
Contributor
Forum|Forum|3 years ago
December 8, 2022

Did you have a look at the Step-by-step Instructions in this article: Working with Geodatabase Metadata: Writing/Updating Metadata?

Upvote

ireen
Author
Contributor
Forum|Forum|3 years ago
December 8, 2022

Did you have a look at the Step-by-step Instructions in this article: Working with Geodatabase Metadata: Writing/Updating Metadata?

Yes, I followed the steps in that article, but nothing is updated in the metadata.

Upvote

+21

debbiatsafe
Safer
Forum|Forum|3 years ago
December 8, 2022

Hi @ireen

If the "Value Type" field in XMLUpdater is set as XML/XQuery, an attribute value must be specified using the fme:get-attribute("<attributeName>") syntax in the Value field. If you want to use the @Value(<attributeName>), please switch "Value Type" to Plain Text.

Edit: it may also be related to your update features are structured. Can you try having one update feature with the four attributes containing the update values? For example, one feature routed to the Update port with the attributes Publication, Toestand, Edition, DatasetIdentificator. It may help to have a sample of the input data as well.

Upvote

ireen
Author
Contributor
Forum|Forum|3 years ago
December 9, 2022

Hi @ireen

Hi @debbiatsafe ,

I first tried that, but that didn't work, so I changed it to the XML/Xquery Value Type. I set it back to Plain Text now, but still the same result (= nothing updated).

I just discovered that, if I fill out these fields manually in ArcGIS Pro, that this information is not found in the metadata that is read by FME. This is what's filled out and seen in ArcGIS Pro:

I used the mehod with the XML Formatter to see what's in the metadata, and this gives the log attached. You'll see the information shown in ArcGIS Pro, can not be found in the XML read by FME.

Any idea how this is possible? Or any tricks to help me out?

XMLformatter_log.txt

Upvote

+10

boydfme
Contributor
Best Answer
Forum|Forum|3 years ago
December 12, 2022

def update_sync_saveOut(source, export_location):
    '''client needed an automated way to upgrade, synchronize,
    and saveAs - All Content for each fc and dataset in their FGBD.
    This function serves the basis of this automation and follows the ESRI 
    step-by-step process to perform these actions in ArcGIS PRO. Takes a source metadata
    location and then a xml file export location. 
    Uses: src_item_md.saveAsXML(export_location, EXACT_COPY)
    '''
    text = ''
    text +='##################\n'
    text += str(source)+'\n'
    text += str(export_location)+'\n'
    #
    #Metadata
    src_item_md = md.Metadata(source)
    try:
        #Upgrade; free free to omit if you don't need this step or don't use FGDC format
        src_item_md.upgrade('FGDC_CSDGM')
        text += 'Upgraded metadata for {}\n'.format(source)
    except:
        text += 'Upgrade not needed for {}\n'.format(source)
    try:
        #synchronize
        src_item_md.synchronize('ALWAYS')
        text += 'Synchronized\n'
        # Save 
        src_item_md.saveAsXML(export_location, 'EXACT_COPY')
        text += 'Saved out successfully.\n'
    except:
        text +=arcpy.GetMessages(0)
    #
    text +=arcpy.GetMessages(0)+'\n'
    return text

After the metadata from the xml has been copied to a location, you can then use it and modify it by using xpaths as such (feel free to modify to your needs; we only update the date fields in ours):

import arcpy
from arcpy import metadata as md
import datetime
#
from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
text = '' # compile all messages along the way
 
src_xml = "path to your feature class xml that was saved out from above function" # raw xml
scratch = "xml path" # location to copy your raw xml with removed info to perform edits and saves
fet_path = "output xml path" #feature class path the edited/updated metadata xml will go to
 
#Save a filtered copy metadata to scratch; 'REMOVE_ALL_SENSITIVE_INFO'
try:
    src_item_md = md.Metadata(src_xml)
    src_item_md.saveAsXML(scratch, 'REMOVE_ALL_SENSITIVE_INFO') # removes unnessary geoprocessing history etc.
    text += 'Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)'+'\n'
except:
    text +=arcpy.GetMessages(0)
#
tree = ET.parse(scratch)
#
# your xpaths
dexpaths = ["//idCitation//pubDate", "//idCitation//resEdDate", "//idCitation//reviseDate", "//idCitation//identCode"]
 
 
for de in dexpaths:
    try:
        date = tree.xpath(de+'//{}'.format('text()'))[0]
        for elem in tree.xpath(de):
            elem.text = datetime.datetime.now().strftime('20%y%m%d')
            text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'
    except:
        text += 'Does not have element:  {}//text(). Passing...\n'.format(de)
#Save date edits to xml
tree.write(scratch)
#
#Import the metadata into FGDB (ArcPro)
src_template_md = md.Metadata(scratch)
tgt_item_md = md.Metadata(fet_path)
tgt_item_md.copy(src_template_md)
tgt_item_md.save()
#
text +=arcpy.GetMessages(0)+'\n'

*Basic message feedback of output and successful modify+import to feature class:

Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)

//idCitation//pubDate Changed from: 20220526 to: 20221202

//idCitation//reviseDate Changed from: 20220526 to: 20221202

//idCitation//resEdDate Changed from: 20220526 to: 20221202

Does not have element: //idCitation//identCode//text(). Passing...

Start Time: Friday, December 2, 2022 9:41:00 AM

Succeeded at Friday, December 2, 2022 9:41:00 AM (Elapsed Time: 0.08 seconds)

Upvote

ireen
Author
Contributor
Forum|Forum|3 years ago
December 13, 2022

Thank you for sharing your approach of this issue. I try to do it this way. Thanks so much!

Upvote

ireen
Author
Contributor
Forum|Forum|3 years ago
January 5, 2023

def update_sync_saveOut(source, export_location):
    '''client needed an automated way to upgrade, synchronize,
    and saveAs - All Content for each fc and dataset in their FGBD.
    This function serves the basis of this automation and follows the ESRI 
    step-by-step process to perform these actions in ArcGIS PRO. Takes a source metadata
    location and then a xml file export location. 
    Uses: src_item_md.saveAsXML(export_location, EXACT_COPY)
    '''
    text = ''
    text +='##################\n'
    text += str(source)+'\n'
    text += str(export_location)+'\n'
    #
    #Metadata
    src_item_md = md.Metadata(source)
    try:
        #Upgrade; free free to omit if you don't need this step or don't use FGDC format
        src_item_md.upgrade('FGDC_CSDGM')
        text += 'Upgraded metadata for {}\n'.format(source)
    except:
        text += 'Upgrade not needed for {}\n'.format(source)
    try:
        #synchronize
        src_item_md.synchronize('ALWAYS')
        text += 'Synchronized\n'
        # Save 
        src_item_md.saveAsXML(export_location, 'EXACT_COPY')
        text += 'Saved out successfully.\n'
    except:
        text +=arcpy.GetMessages(0)
    #
    text +=arcpy.GetMessages(0)+'\n'
    return text

After the metadata from the xml has been copied to a location, you can then use it and modify it by using xpaths as such (feel free to modify to your needs; we only update the date fields in ours):

import arcpy
from arcpy import metadata as md
import datetime
#
from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
text = '' # compile all messages along the way
 
src_xml = "path to your feature class xml that was saved out from above function" # raw xml
scratch = "xml path" # location to copy your raw xml with removed info to perform edits and saves
fet_path = "output xml path" #feature class path the edited/updated metadata xml will go to
 
#Save a filtered copy metadata to scratch; 'REMOVE_ALL_SENSITIVE_INFO'
try:
    src_item_md = md.Metadata(src_xml)
    src_item_md.saveAsXML(scratch, 'REMOVE_ALL_SENSITIVE_INFO') # removes unnessary geoprocessing history etc.
    text += 'Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)'+'\n'
except:
    text +=arcpy.GetMessages(0)
#
tree = ET.parse(scratch)
#
# your xpaths
dexpaths = ["//idCitation//pubDate", "//idCitation//resEdDate", "//idCitation//reviseDate", "//idCitation//identCode"]
 
 
for de in dexpaths:
    try:
        date = tree.xpath(de+'//{}'.format('text()'))[0]
        for elem in tree.xpath(de):
            elem.text = datetime.datetime.now().strftime('20%y%m%d')
            text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'
    except:
        text += 'Does not have element:  {}//text(). Passing...\n'.format(de)
#Save date edits to xml
tree.write(scratch)
#
#Import the metadata into FGDB (ArcPro)
src_template_md = md.Metadata(scratch)
tgt_item_md = md.Metadata(fet_path)
tgt_item_md.copy(src_template_md)
tgt_item_md.save()
#
text +=arcpy.GetMessages(0)+'\n'

*Basic message feedback of output and successful modify+import to feature class:

Saved filtered copy (REMOVE_ALL_SENSITIVE_INFO)

//idCitation//pubDate Changed from: 20220526 to: 20221202

//idCitation//reviseDate Changed from: 20220526 to: 20221202

//idCitation//resEdDate Changed from: 20220526 to: 20221202

Does not have element: //idCitation//identCode//text(). Passing...

Start Time: Friday, December 2, 2022 9:41:00 AM

Succeeded at Friday, December 2, 2022 9:41:00 AM (Elapsed Time: 0.08 seconds)

How do I find the xpaths in the XML? In the XML-tree I have several look-a-like-paths for different date types (creation / publication / revision):

So, //idCitation//date is not clear ... How do I find the correct xpaths?

Upvote

+10

boydfme
Contributor
Forum|Forum|3 years ago
January 5, 2023

XPath uses path expressions to select nodes or node-sets in an XML document which you can then use to find or modify text or elements. So some understanding of XML structure and xpath syntax might be helpful.

Example 1:

So say I had this section of an metadata xml and I wanted to pull out the ModDate text (the highlighted part):

The xpath would be: "//Esri//ModDate" (this would retrieve the element object). To actually pull the date text out (20220429), you would add the xpath command: text(). So the xpath would now be: "//Esri//ModDate//text()" which would give us the date text in a list and if we were to print it out, you would get ['20220429']. You can then use python list indexing to then pull the actual date value out like in my previous example by adding [0].

So what is this short xpath syntax saying in laymen terms? Find all root instances of Esri and then find the child element ModDate and then strip out the date element text inside ModDate.

Example 2:

Now lets look at a more nested example where there is more than one of the same root element. Say we wanted to add in a date or modify the section of text in the tmPosition (the 'unknown' text; underlined in blue) element:

Following the xml structure here, We would use the xpath: '//dataExt//tempEle//exTemp//TM_Instant//tmPosition//text()' which would give use a list of all the tmPosition text elements. In this example, we would get a list of ['Unknown', 'Unknown']. Now that we confirmed we got our xpath right, you could then update both 'Unknowns' by doing something like this:

de = '//dataExt//tempEle//exTemp//TM_Instant//tmPosition'
for elem in tree.xpath(de):
    elem.text = datetime.datetime.now().strftime('20%y-%m-%d')
    text += de + ' Changed from: '+date+' to: ' + str(elem.text)+'\n'

If you wanted to update ....say...the last 'Unknown' element only and leave the first one alone, you could use list indexing to only look at the last occurrence:

de = '//dataExt//tempEle//exTemp//TM_Instant//tmPosition'
for elem in [tree.xpath(de)[-1]]:
    elem.text = datetime.datetime.now().strftime('20%y-%m-%d')

Your example:

Looking at your screenshot, it looks like the xpaths to get the "gco:Date" would be something like this:

'//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date'

and

'//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date//text()'

You can play around and confirm by using this code snippet (just be sure to include your path to the xml).

from lxml import etree as ET
from xml.dom import minidom
from xml.etree import ElementTree
 
md_xml = r"<your path to metadata xml here>"
tree = ET.parse(md_xml)
    
de = '//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date'
 
de_text = '//identificationInfo//MD_DataIdentification//citation//CI_Citation//date//CI_Date//date//gco:Date//text()'
 
text = tree.xpath(de_text)
text1 = tree.xpath(de_text)[0]
print (text)
print (text1)

Theoretically, if I got my formatting right, it should give you a list of all instances of your highlighted parts you are looking for. My thought process using your xml example:

There is a lot of different ways to approach this but hopefully, this helps!

Upvote

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded