When we copy-paste from a program like Microsoft Word into EditLive, information is placed first on the clipboard and then is inserted into EditLive. Often, we end up with a great deal more on the clipboard than you might expect. A simple line of text might bring with it hundreds of lines worth of style sheet declarations, metadata, and application-specific data, and we don't want this unnecessary markup making it into our HTML! Thankfully EditLive! offers a clean copy-paste from Microsoft Word right out of the box, ensuring that the code that arrives in the editor is clean and standards compliant.
Lets explore how this copy-paste operation works, when are filters applied, and how information makes it from the a Word Document to EditLive.
How does clean copy-paste happen?
The process takes 4 discrete stages, and each is clearly visible within an EditLive debug log. It is suggested that you create a debug log of your own (capturing the steps of a copy-paste action) and then follow along with this guide.
The diagram below represents the 4 stages of the process: pasting, filtering, tidying, and insertion.
First, the user copies inform Microsoft Word and then performs a paste action into EditLive. Looking at the debug log, we see that EditLive has received the data, and a simple copy-paste of one line of text has ballooned to be over 100 lines of code!
The log displays the unfiltered HTML after the following declaration as EditLive interprets the pasted content.
13:48:40:842 [DEBUG] EphoxTransferHandler - -(AWT-EventQueue-3) Import data.
13:48:40:843 [DEBUG] EphoxTransferHandler - -(AWT-EventQueue-3) Available data flavors:
[MS Word HTML Redacted - It really is that long!]
The unfiltered content appears a second time as it is actually pulled from the clipboard. No filtering has occurred at this point.
13:48:40:890 [DEBUG] EphoxTransferHandler - -(AWT-EventQueue-3) Clipboard data:
EditLive then reads the content and filters out Word-Specific markup. Some styling may be preserved, depending the setup of your configuration file. The resultant HTML (as seen below) is almost right, but still includes un-necessary items. Once completed, any of your own paste filters will be applied at this stage.
13:48:40:893 [DEBUG] EphoxPasteFilter - -(AWT-EventQueue-3) Clean word
13:48:40:971 [DEBUG] OfficeImportFunction - -(AWT-EventQueue-3) Content before importing MS-Word lists:
Finally, EditLive reformats the word-markup-free HTML and removes any containing HTML items that won't be needed. The insert operation is then performed and content appears in the editor window.
13:48:40:986 [DEBUG] InsertHtmlOperation - -(AWT-EventQueue-3) Filtered content: <p>###Content###</p>
We’ve seen that copy pasting from Microsoft Word is a multi-step process, where information moves first to the clipboard, and then into EditLive where it is filtered before actually arriving in your content. Remember that these precautions are taken to avoid unnecessary markup, ensure standards-compliant HTML, and to give you the chance to modify HTML before it arrives in the editor.
But what about other application data that we might want, like track changes?
Unfortunately, EditLive is only able to draw from what an application places onto the clipboard. Things like track changes, page sizes, versioning and owner information just don't make it to the clipboard, and as such are not imported to EditLive.
What if items are making it into the editor that we do not want?
If need to restrict items (tags, images, etc.) from making it from the clipboard into the editor, we encourage you to create your own paste filter to strip out those elements before they can harm your content.