Second Life of a Hungarian SharePoint Geek

May 6, 2007

Hunting the encoding problem in content deployment – part 2

Filed under: Content Deployment, SharePoint — Tags: , — Peter Holpar @ 23:45

Note: This is a repost, the original one was published on SharePoint Blogs on May 6, 2007

In the previous part we saw that special characters in meta-information of publishing pages handled incorrectly by the export process (and by the content deployment that is based on the export process). This part will show you how we used Lutz Roeder’s .NET Reflector and SQL Server Profiler to check the internal working of the export process and identify the potential source of the problem.

Most of the classes related to the export process are located in the Microsoft.SharePoint.Deployment namespace within the Microsoft.SharePoint.dll assembly so if not specified otherwise you should find classes I am writing about that place.

As we already know the export package is created by the Run() method of the SPExport class. In this method the exportation (serialization) of the objects is made by the call this.SerializeObjects(). Within this method the line serializer.Serialize(deployObject, writer.BaseStream) is responsible for serialization to the target file. The serializer object in this case is an ObjectSerializer:

ObjectSerializer serializer = new ObjectSerializer(deploymentContext);

In the ObjectSerializer(DeploymentStreamingContext deploymentContext) constructor you will find this line:

this.AddSerializationSurrogates(selector, this.m_context);

The AddSerializationSurrogates method is responsible for registering different type of serializers for the SharePoint object types that should be persisted during the export. The following line handles the serializer for the SPFile object type. Since we had problems with the SPFile Property value in the Manifest.xml, we will follow this track:

selector.AddSurrogate(typeof(SPFile), context, new FileSerializer());

As you see SPFile objects are persisted using the FileSerializer object that is a derived class of DeploymentSerializationSurrogate. DeploymentSerializationSurrogate implements the System.Runtime.Serialization.ISerializationSurrogate interface, so object data is provided to the serializer through the GetObjectData(object obj, SerializationInfo info, StreamingContext context) method.

If you check the definition of this method in the DeploymentSerializationSurrogate class, you can see that if the DataSet property of the obj object (after casting to ExportObject that is a subclass of DeploymentObject) is not null then the GetDataFromDataSet(object obj, SerializationInfo info, StreamingContext context) method is used during serialization. For the FileSerializer this is the case so we should check its GetDataFromDataSet method. Within this method the call HandleMetaInfo(objectManager, fileMetaData.ItemArray[8], info, settings) is responsible for persisting meta-information.

We investigated the SPS content database and created SQL trace during the export process and found that the file information can be read through the Docs view (that is based on table AllDocs). The meta-information is stored in an image data type column called MetaInfo within the view.

The stored procedure proc_DeplGetFileData is used for retrieving file information for deployment. The MetaInfo column is the 9th field in the SELECT statement in this SP. You can see that in the GetDataFromDataSet method the HandleMetaInfo method is also called using the 9th field (fileMetaData.ItemArray[8]) of each DataRow from the DataSet of the given ExportObject .

It means that within the HandleMetaInfo method the new MetaInfoHandler object instance is created using the value of the MetaInfo column. If you check the constructor of the MetaInfoHandler you can see that the constructor parameter is casted to a byte array when calling the Parse(byte[] propertyBytes) method.

Since the MetaInfo is binary data we suspected that the following lines may cause the problem:

property.TheString = new string(chArray, startIndex, (num2 – startIndex) + 1);

And later:

property.Value = new string(chArray, num3, index – num3);

These lines handle parts of MetaInfo as simple ASCII text converting each byte to a single character. This is incorrect in the case of special characters since these characters may consist of two (or more) bytes.

Instead of these, one should use the following lines:

UTF8Encoding utfEncoding = new UTF8Encoding();

property.TheString = utfEncoding.GetString(propertyBytes, startIndex, (num2 – startIndex) + 1);

And similarly:

UTF8Encoding utfEncoding = new UTF8Encoding();

property.Value = utfEncoding.GetString(propertyBytes, num3, index – num3);

We should note that the following line may also cause problem:

property.Name = new string(chArray, num3, index – num3);

Probably we had no issue with that because we don’t use special characters in property names. In Hungary we learned the hard way in previous SharePoint versions that it’s best to avoid accentuated letters in field, list, view, etc. names.

We tried to prove our theory using a single console application. For this application we copied the source of the entire MetaInfoHandler, MetaInfoProperty and SerializationInfoHelper classes from .NET Reflector into our project (since all of these classes are declared as internal we could not use them directly). Then we made a MetaInfoHandlerEx class that is equivalent with the MetaInfoHandler except the above mentioned code modifications.

In the main program we connected directly to the SPS content database and selected the MetaInfo for a given publishing page that contains special characters in its page content. We tried to use both MetaInfoHandler and MetaInfoHandlerEx classes to get the value of the property.

Our results show that the original version returns the incorrect value but our version handles the special characters correctly.

The code of this application is attached to this post. Don’t forget to adjust the constant values (SQL server name, content database name, publishing page name whose content contains the special characters) in the code to match with your environment before making the test.

PS: We shared our founding more than a month ago with a local MS consultant we worked together on a project and whose responsibility was to help us to solve this kind of issues on the project. Unfortunately he told us that he can do nothing with this information.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: