I used SSIS to extract data from SharePoint lists for use elsewhere in the enterprise. For "flat" lists, where we may want to extract data from a single list into a single table, this is straightforward to setup using the SharePoint connector available from CodePlex
I came across a poor decision that was made in the past to use SharePoint version history to store changes in metadata, a temporal representation of lists data over time, expecting this to be extracted for reuse. The original solution directly queried SharePoint internal content database to get list data, so whilst it was reasonably fast, it gave me headache to read and is at risk of being destroyed the next time Microsoft release a SharePoint service pack.
In the past we all did crazy things like stage diving, drinking snake bite and querying SharePoint content databases directly. WE DON'T DO THAT NOW! Well querying SharePoint content databases directly at least.
Using SSIS to replace this we need to:
- Extract the list data from the SharePoint list using the SharePoint SSIS connector, so that we have a table containing “un-versioned” data
- Select the ids of each SharePoint list items into another table and output to a resultset
- Using this table of ids, use a SSIS control flow ForEach loop, iterating a variable mapping, pointing to the resultset configured in step 2
- Assuming that we have a spreadsheet containing the mappings of the SharePoint internal name and the destination column, we make a connection to it
- Now that we have three parameters to make the web service call, the list Id, item id, an internal column name, we can make the web service call.
- You may have noticed that each field and each row are needed to make a single webservice call to get version data, so for instance if you have 10 rows, 5 fields, you will need to make 50 web service calls! There is no batching mechanism, this is is! Perhaps, the reason why Microsoft do not offer any restful mechanism or batching is that using versioning data to store essentially relational data, is so ridiculous that no one would do it.
- The resultant xml is processed using a script task inside a data flow process, into an output buffer
<Versions> <Version PostalAddress="1 Acacia Ave, London" Modified="2015-05-11T15:09:36Z" Editor="…" /> <Version PostalAddress ="1 First Ave, Chicago" Modified="2015-01-27T14:44:30Z" Editor="…" /> <Version PostalAddress ="1 Champs Elysees, Paris" Modified="2015-01-27T13:34:34Z" Editor="…" /> </Versions>
- Finally the outputbuffer is merged into the “unversioned” table in step one