Restore a corrupt Word/XML document

Today I ran into a lot of trouble. I was working on a Word (2016) document. Happily typing away and saving the document from time to time to SharePoint Online.

When I was finished, I decided to share the document and test the link provided to me. Instead of the document, I got an error message. Not to hopeful, I decided to open the document in Word instead of Word Online.

And then it happened….

Error_1

Ok. These things do happen. I guess. And I’m not fainthearted. So, I decided to solve this.

I have. But I has been a struggle. Let me guide you through the process.

Office XML format

First thing to remember is that Office uses the Office XML format. Every .docx document is basically a zip file. The content is stored within, including the document metadata, structure and even some SharePoint metadata.

So, step 1: replace the .docx with .zip.

error_2

Next, it is time to find the culprit. In this case, it is the document.xml file. This can be found in de word folder.

error_3

Step 2: Open document.xml with Notepad++

To open this XML file and recover the document, I used Notepad++. I even used the XML plugins, but these weren’t useful to me (in the end). So, let’s open the document and remember the error: Line 2.

error_4

Ok. So the error’s in line 2. Line 2 has all the XML code stored within. And I can’t use the XML plugins, because of the error. We can get around this, thanks to some information on the web.

Step 3: Replace some code

Within the XML file, replace all >< with  >\r\n<.

error_5

 

 

 

 

 

 

 

 

 

What this does, is create a new line for every XML line. And this is very handy.

You will see all the XML now.

error_6

Step 4: Save the document.xml back

A weird step this, but it works. Save the document.xml again. Use the Save a copy function for this. Notepad++ will detect errors in the XML and won’t save the original. But a copy (using the same name) will do the trick.

After saving the file, include it into the .zip document and rename it back to .docx.

Step 5: Done

Yeah, right. No, it’s not that easy. Now that you’ve saved the document.xml file back into the .docx file, Word will provide us with some more information. Try to open the document again. You will get the same error, but this time: the exact line you need.

error_7

In this case, line 720. Don’t bother with the column.

Step 6: Modify the document.xml.

Now we know what to look for. Don’t bother trying to get to the reason for the error. All code looks healthy to me. Just go to the line stated.

error_8

Here you will either find a line beginning with </ or <.

Now comes the hard part. Find the entire section of the line. In my example, the line is </w:pict>, so I have to look for the connecting <w:pict> part.

error_9

This part begins at line 653. I select all of the code and simply delete it.

Now save the file again, place it back into the .zip file. Rename the .zip file to .docx and try to open it. In my case, I got another error message. This one also to do with <w:pict>. I removed that section as well.

Step 7: Success

In the end, when you have removed all the sections, Word will open the document. In my case, no content was harmed using this procedure. And I was very, very happy.

error_10

 

Success!

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s