Rewrite the styled code in HTML generated by Apple to WordPress compatible HTML

My first blog writing was in 2013, and at that time, WordPress was able to handle the styled code correctly, i.e., the code preserved the syntax highlight when I copy it from Xcode / CodeRunner and paste into the WordPress editor. The editor was capable of converting or persevering the colour info, and it did a great job of formatting the styled code into HTML.

Just like this post, https://await.moe/2013/08/assertmacros-problem/. The code shown below

typedef int (*PYStdWriter)(void *, const char *, int);
static PYStdWriter _oldStdWrite;

could be nicely formatted into the corresponding HTML code

<span style="color: #bb2ca2;">typedef</span> <span style="color: #bb2ca2;">int</span> (*PYStdWriter)(<span style="color: #bb2ca2;">void</span> *, <span style="color: #bb2ca2;">const</span> <span style="color: #bb2ca2;">char</span> *, <span style="color: #bb2ca2;">int</span>);
<span style="color: #bb2ca2;">static</span> <span style="color: #4f8187;">PYStdWriter</span> _oldStdWrite;

However, it was about the time WordPress upgraded to 3.9, the aforementioned functionality was removed. Although there are tens of syntax highlighting plugins, but I don't really like the colour schemes they offer. Besides, sometimes I may need to highlight a small portion of code. Such as this post, https://ryza.moe/2017/05/the-reason-that-codesign-remove-signature-generates-malformed-macho-still-remains-mystery/

/*
* If this has a code signature load command reuse it and just change
* the size of that data.  But do not use the old data.
*/
if(object->code_sig_cmd != NULL){
    if(object->seg_linkedit != NULL){
        object->seg_linkedit->filesize += arch_signs[i].datasize - object->code_sig_cmd->datasize; 
        if(object->seg_linkedit->filesize > object->seg_linkedit->vmsize)

As you can see, using native HTML code could enable extra control and functionality.

So the first workaround was to install an old version of WordPress on my Mac, copy & paste the code into the old editor and then copy the formatted HTML code out. It's not deniable that the process was quite inconvenient and it requires me to install MySQL and enable Apache2 on my Mac, which possibly leads to potential security issues.

The subsequent workaround was to copy & paste the code into TextEdit, which comes with macOS, and save as HTML file. For example, the following C++ code

#include <iostream>

int main(int argc, const char * argv[]) {
    // insert code here...
    std::cout << "Hello, World!\n";
    return 0;
}

would be transformed into

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta http-equiv="Content-Style-Type" content="text/css">
  <title></title>
  <meta name="Generator" content="RyzaHTML Writer">
  <meta name="CocoaVersion" content="1671.5">
  <style type="text/css">
    p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 14.0px; font: 12.0px Menlo; color: #b50013; -webkit-text-stroke: #b50013; background-color: #ffffff}
    p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 14.0px; font: 12.0px Menlo; color: #000000; -webkit-text-stroke: #000000}
    p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 14.0px; font: 12.0px Menlo; color: #000000; -webkit-text-stroke: #000000; background-color: #ffffff}
    p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 14.0px; font: 12.0px Menlo; color: #425266; -webkit-text-stroke: #425266; background-color: #ffffff}
    span.s1 {font-kerning: none; color: #502a18; -webkit-text-stroke: 0px #502a18}
    span.s2 {font-kerning: none}
    span.s3 {font-kerning: none; color: #870581; -webkit-text-stroke: 0px #870581}
    span.s4 {font-kerning: none; color: #000000; -webkit-text-stroke: 0px #000000}
    span.s5 {font-kerning: none; color: #491187; -webkit-text-stroke: 0px #491187}
    span.s6 {font-kerning: none; color: #1400c4; -webkit-text-stroke: 0px #1400c4}
  </style>
</head>
<body>
<p class="p1"><span class="s1">#include </span><span class="s2">&lt;iostream&gt;</span></p>
<p class="p2"><span class="s2"><br>
</span></p>
<p class="p3"><span class="s3"><b>int</b></span><span class="s2"> main(</span><span class="s3"><b>int</b></span><span class="s2"> argc, </span><span class="s3"><b>const</b></span><span class="s2"> </span><span class="s3"><b>char</b></span><span class="s2"> * argv[]) {</span></p>
<p class="p4"><span class="s4"><span class="Apple-converted-space">    </span></span><span class="s2"><i>// insert code here...</i></span></p>
<p class="p1"><span class="s4"><span class="Apple-converted-space">    </span></span><span class="s5">std</span><span class="s4">::</span><span class="s5">cout</span><span class="s4"> &lt;&lt; </span><span class="s2">"Hello, World!\n"</span><span class="s4">;</span></p>
<p class="p3"><span class="s2"><span class="Apple-converted-space">    </span></span><span class="s3"><b>return</b></span><span class="s2"> </span><span class="s6">0</span><span class="s2">;</span></p>
<p class="p3"><span class="s2">}</span></p>
</body>
</html>

Thus I need to manually inline all the corresponding CSS classes, otherwise the colours would be overwritten if I post another article, since the names of these classes will be the same pattern.

The process of manually renaming the classes is as tedious as the previous way. So, why not write a program? It firstly reads out the <style>...</style> part, and then substitutes the corresponding code automatically.

The git repo would be #/codetowp

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import argparse
import re


class AppleGeneratedHTMLCodeRewriter:
    """
    Rewrite the styled code in HTML generated by Apple
    to WordPress compatible native HTML code
    """
    
    def __init__(self, html_file, output_file):
        """
        Rewrite the content from ``html_file`` and output to ``output_file``
        """
        self.html_file = html_file
        if output_file == "-":
            self.output_mode = 1
        else:
            self.output_mode = 0
            self.output = open(output_file, "w")

    def write_output(self, content):
        """
        Write transformed content to file or stdout
        """
        if self.output_mode == 1:
            print(content, end='')
        else:
            self.output.write(content)

    def rewrite(self):
        """
        The actual rewrite implementation
        """
        class_re = re.compile("(.*)?class=(.*)")
        style_re = re.compile("^[\\s]*(\\w+)\\.([sp\\d]+) .*[ {]color: (#[a-f0-9]+)")
        
        # save styles
        # e.g,
        # {"p1" : "#000233", "s1" : "#233333"}
        styles = {}
        # number of classes of ``<p></p>``
        p_class_num = 0
        # number of classes of ``<span></span>``
        span_class_num = 0
        # replacement strings
        # e.g,
        # {
        #    "<p class=\"p1\"" : "<p style=\"color: #000233\"",
        #    "class=\"s1\" : "style=\"color: #233333\""
        # }
        replacement = {}
        
        # 0: we haven't encountered <style>...</style>
        # 1: Handling
        # 2: the <style>...</style> has been processed
        have_encountered_style = 0
        
        with open(self.html_file) as f:
            for line in f:
                line = line.strip()
                if have_encountered_style != 2:
                    matches = style_re.match(line)
                    if matches is not None:
                        have_encountered_style = 1
                        styles[matches.group(2)] = matches.group(3)
                        if matches.group(1) == "p":
                            p_class_num += 1
                        else:
                            span_class_num += 1
                    else:
                        if len(styles) != 0:
                            have_encountered_style = 2
                            for p_index in range(1, 1 + p_class_num):
                                origin = "p class=\"p{}\"".format(p_index)
                                replace = "span style=\"color: {}\"".format(styles["p{}".format(p_index)])
                                replacement[origin] = replace
                            replacement["</p>"] = "</span>"
                            
                            for span_index in range(1, 1 + span_class_num):
                                origin = "class=\"s{}\"".format(span_index)
                                replace = "style=\"color: {}\"".format(styles["s{}".format(span_index)])
                                replacement[origin] = replace
                else:
                    if len(line) <= 7:
                        continue
                    for r in replacement:
                        line = line.replace(r, replacement[r])
                    self.write_output(line)
                    self.write_output("\n")
        if self.output_mode == 0:
            self.output.close()

def parse_arg():
    """
    parse CLI args
    """
    parser = argparse.ArgumentParser(description="Rewrite styled code in HTML generated by Apple to WordPress compatible native HTML code")
    parser.add_argument("-f", "--file", type=str, help="file path")
    parser.add_argument("-o", "--output", type=str, help="output path, or \"-\" for stdout")
    return parser, parser.parse_args()

def main():
    parser, args = parse_arg()
    if args.file is None or args.output is None:
        parser.print_help()
    else:
        rewriter = AppleGeneratedHTMLCodeRewriter(args.file, args.output)
        rewriter.rewrite()

if __name__ == "__main__":
    main()

Leave a Reply

Your email address will not be published. Required fields are marked *

five × 5 =