Second Life of a Hungarian SharePoint Geek

March 29, 2015

May Merge-SPLogFile Flood the Content Database of the Central Administration Web Application?

Filed under: Content database, PowerShell, SP 2013 — Tags: , , — Peter Holpar @ 00:20

In the recent weeks we searched for the cause of a specific error in one of our SharePoint 2013 farms. To get detailed trace information, we switched the log level often to the VerboseEx mode. A few days later the admins alerted us that the size of the Central Administration content database has been increased enormously (at that time it was about 80 GB!).

Looking for the source of this unexpected amount of data I found a document library called Diagnostics Log Files (Description: View and analyze merged log files from the all servers in the farm) that contained 1824 items.

image

image

Although I consider myself primarily a SharePoint developer, I always try to remain up-to-date in infrastructural themes as well, but to tell the truth I’ve never seen this document library before. Searching the web didn’t help as well.

Having a look into the document library I found a set of folders with GUID names.

image

Each of the folders contained a lot of files: a numbered set for each of the servers in the farm.

image

The files within the folders are archived logs in .gz file format, each around 26 MB.

image

Based on the description of the library I guessed that it has a relation to the Merge-SPLogFile cmdlet, that performs collection of ULS log files from all of the servers in the farm an saves the aggregation in the specified file on the local system, although I have not found any documentation how it performs this action and if it has anything to do with the content DB of the central admin site.

 

After a few hours of “reflectioning”, it was obvious, how this situation was achieved. If you are not interested in call chains, feel free to jump through the following part.

All of the classes and methods below are defined in the Microsoft.SharePoint assembly, if not specified otherwise.

The InternalProcessRecord method of the Microsoft.SharePoint.PowerShell.SPCmdletMergeLogFile class (Microsoft.SharePoint.PowerShell assembly) creates a new instance of the SPMergeLogFileUtil class based on the path of the aggregated log folder and the filter expression, and calls its Run method:

SPMergeLogFileUtil util = new SPMergeLogFileUtil(this.Path, filter);
ThreadPool.QueueUserWorkItem(new WaitCallback(util.Run));

In the Run method of the Microsoft.SharePoint.Diagnostics.SPMergeLogFileUtil class:

public void Run(object stateInfo)
{
  try
  {
    this.Progress = 0;
    // Creates the diagnostic log files document library in central admin (if does not yet exist) via the GetLogFilesList method
    // Add a new subfolder (having GUID name) to the library. If the folder already exists, deletes its content.
    // Executes child jobs (SPMergeLogFilesJobDefinition) on each farm member, see more details about it later below
    List<string> jobs = this.DispatchCollectingJobs();
    // Waits for all child jobs to be finished on the farm members
    this.MonitorJobs(jobs);
    // Merges the content of the collected files from the central admin document library into the specified local file system folder by calling the MergeFiles method
    // Finally deletes the temporary files one by one from the central admin document library and at the very end deletes their folder as well
    this.RunMerge();
  }
  catch (Exception exception)
  {
    this.Error = exception;
  }
  finally
  {
    this.Progress = 100;
  }
}

Yes, as you can see, if there is any error in the file collection process on any of the farm members, or the merging process fails, the files won’t be deleted from the Diagnostics Log Files document library.

Instead of the RunMerge method, the deletion process would have probably a better place in the finally block, or at least, in the catch block one should check if the files were removed successfully.

 

A few words about the Microsoft.SharePoint.Diagnostics.SPMergeLogFilesJobDefinition, as promised earlier. Its Execute method calls the CollectLogFiles method that creates an ULSLogFileProcessor instance based on the requested log filter and gets the corresponding ULS entries from the farm member the job running on, stores the entries in a temporary file in the file system, uploads them to the actual subfolder (GUID name) of the Diagnostics Log Files document library (file name pattern: [ServerLocalName] (1).log.gz or [ServerLocalName].log.gz if a single file is enough to store the log entries from the server), and delete the local temporary file.

 

A few related text values can we read via PowerShell as well:

In the getter of the DisplayName property in the SPMergeLogFilesJobDefinition class returns:

SPResource.GetString("CollectLogsJobTitle", new object[] { base.Server.DisplayName });

reading the value via PowerShell:

[Microsoft.SharePoint.SPResource]::GetString("CollectLogsJobTitle")

returns

Collection of log files from server |0

In the GetLogFilesList method of the SPMergeLogFileUtil class we find the title and description of the central admin document library used for log merging

string title = SPResource.GetString(SPGlobal.ServerCulture, "DiagnosticsLogsTitle", new object[0]);
string description = SPResource.GetString(SPGlobal.ServerCulture, "DiagnosticsLogsDescription", new object[0]);

Reading the values via the PowerShell:

[Microsoft.SharePoint.SPResource]::GetString("DiagnosticsLogsTitle")

returns

Diagnostics Log Files

[Microsoft.SharePoint.SPResource]::GetString("DiagnosticsLogsDescription")

returns

View and analyze merged log files from the all servers in the farm

These values supports our theory, that the library was created and filled by these methods.

 

Next, I’ve used the following PowerShell script to look for the failed merge jobs in the time range when the files in the Central Administration have been created:

$farm = Get-SPFarm
$from = "3/21/2015 12:00:00 AM"
$to = "3/21/2015 6:00:00 AM"
$farm.TimerService.JobHistoryEntries | ? {($_.StartTime -gt $from) -and ($_.StartTime -lt $to) -and ($_.Status -eq "Failed") -and ($_.JobDefinitionTitle -like "Collection of log files from server *")}

The result:

image

As you can see, in our case there were errors on both farm member servers, for example, the storage space was not enough on one of them. After the jobs have failed, the aggregated files were not deleted from the Diagnostics Log Files document library of the Central Administration.

Since even a successful execution of the Merge-SPLogFile cmdlet can temporarily increase the size of the Central Administration content database considerably, and the effect of the failed executions is not only temporary (and particularly large, if it happens several times and is combined with verbose logging), SharePoint administrators should be aware of these facts, consider them when planning database sizes and / or take an extra maintenance step to remove the rest of failed merge processes from the Diagnostics Log Files document library regularly. As far as I see, the issue hits SharePoint 2010 and 2013 as well.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: