Solved

How to transform CSV to Turtle (.ttl)


Hi, I have a specific use case to solve. Transform a CSV file to a Turtle file (.ttl).

Here is an example:

CSV file has:

 12345, Fred, 2020-01-01, "Foo, Bar"

    I would like to then transform this row to:

 <http://example.com/12345> rdf:type owl:NamedIndividual ;
                 a:name "Fred" ;
                 a:date "2020-01-01";
                  a:ref "http://example.com/Foo",
                             "http://example.com/Bar" .

And then repeat for all the lines in the  CSV.

Any help appreciated

 

icon

Best answer by hkingsbury 9 May 2022, 22:09

View original

8 replies

Userlevel 5
Badge +29

Have a look at the StringConcatenator. You can then use the example above as a template and insert the appropriate values into it

 

 <http://example.com/@Value(id)> rdf:type owl:NamedIndividual ;
                 a:name "@Value(name)" ;
                 a:date "@Value(date)";
                  a:ref "http://example.com/@Value(Val1)",
                             "http://example.com/@Value(Val2)" .

 

Have a look at the StringConcatenator. You can then use the example above as a template and insert the appropriate values into it

 

 <http://example.com/@Value(id)> rdf:type owl:NamedIndividual ;
                 a:name "@Value(name)" ;
                 a:date "@Value(date)";
                  a:ref "http://example.com/@Value(Val1)",
                             "http://example.com/@Value(Val2)" .

 

Thanks - that is very helpful and I got most of it to work...

Two edge cases:

  • can I detect an empty value (ie no value between the commas) and ignore the new line?
  • how can I check if a value like "Foo, Bar" has a comma in-between and create two output lines with each value?
Badge +2

@riannella​ I really like @hkingsbury​ 's approach - very elegant. An alternative might be to use AttributeExploder that will give you name/value pairs which you could manipulate into the Turtle format. AttributeExploder might be more flexible if your CSV attributes change with each dataset.

Userlevel 5
Badge +29

Thanks - that is very helpful and I got most of it to work...

Two edge cases:

  • can I detect an empty value (ie no value between the commas) and ignore the new line?
  • how can I check if a value like "Foo, Bar" has a comma in-between and create two output lines with each value?

Can you provide some example data? Often in these situations its best to look at specific data rather than speculate the exact format and behavior of it

Thanks - that is very helpful and I got most of it to work...

Two edge cases:

  • can I detect an empty value (ie no value between the commas) and ignore the new line?
  • how can I check if a value like "Foo, Bar" has a comma in-between and create two output lines with each value?

Hi, consider this data:

12345, Fred, 2020-01-01, Foo
12346, Mary, 2020-01-01, "Foo, Bar, Me"
12347, Bill,,

The first line would produce:

<http://example.com/12345> rdf:type owl:NamedIndividual ;
                 a:name "Fred" ;
                 a:date "2020-01-01" ;
                  a:ref "http://example.com/Foo" .

The second line:

<http://example.com/12346> rdf:type owl:NamedIndividual ;
                 a:name "Fred" ;
                 a:date "2020-01-01" ;
                  a:ref "http://example.com/Foo" ,
                          "http://example.com/Bar"
                          "http://example.com/Me" .

And the third:

<http://example.com/12347> rdf:type owl:NamedIndividual ;
                 a:name "Fred" .

Thanks!

Thanks - that is very helpful and I got most of it to work...

Two edge cases:

  • can I detect an empty value (ie no value between the commas) and ignore the new line?
  • how can I check if a value like "Foo, Bar" has a comma in-between and create two output lines with each value?

I also have cases where text like Fred, Mary, and Bill may already be in quotes ("Fred") - so I think I need to remove all quotes (if they are there) then re-add them to be sure.

Userlevel 5
Badge +29

Thanks - that is very helpful and I got most of it to work...

Two edge cases:

  • can I detect an empty value (ie no value between the commas) and ignore the new line?
  • how can I check if a value like "Foo, Bar" has a comma in-between and create two output lines with each value?

Hey, sorry for the delay in getting back to you.

See the attached template :)

Thanks - that is very helpful and I got most of it to work...

Two edge cases:

  • can I detect an empty value (ie no value between the commas) and ignore the new line?
  • how can I check if a value like "Foo, Bar" has a comma in-between and create two output lines with each value?

Many Thanks - that is brilliant !!😀

Reply