Proofreading HTTP Server

Writing *correct* English is always a hard task for most of Japanese people. Microsoft Word provides powerful functionality to proofread our poor English. However, not everyone likes Microsoft. This is the reason why I created an HTTP server that proofreads text using Microsoft Word's spell check functionality.

卒論とかで英語で書かないといけないですよね。Microsoft Wordの文章校正機能は僕の英語力よりも素晴らしいのですが、全員が全員Microsoftを好きというわけじゃないので、その校正機能だけを取り出したい. というわけで、Microsoft Wordの文章校正機能を使って校正をしてそのレスポンスをJSONで返すようなシンプルなHTTP ServerをC#で作りました.

Proofreading HTTP Server


The HTTP server works in an Windows machine (the right in the figure) and my Mac (the left in the figure) sends an HTTP POST request with a text in its body. The server extracts the text and craetes a Word Document object using Word object model to proofread the text *1. As the result is returned in JSON, this system is extensible to other forms.

http://gyazo.com/3c308ca4744ed9901f4916ee696fe7c8.png

Server side in C# in Visual Studio 2008.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.Web;

using System.ComponentModel;
using System.Data;

using System.Reflection;
using Word = Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.Word;
using System.Diagnostics;



namespace ProofreadingHTTPServer
{
    class Program
    {
        static void Main(string[] args)
        {
            IPHostEntry entry = Dns.GetHostEntry(Dns.GetHostName());
            foreach (IPAddress ip in entry.AddressList)
            {
                if (ip.AddressFamily.ToString().Equals(ProtocolFamily.InterNetwork.ToString()))
                {
                    string ipaddress = ip.ToString();
                    message("IP Address: " + ipaddress);
                }
            }

            string prefix = "http://+:8000/";

            HttpListener listener = new HttpListener();
            listener.Prefixes.Add(prefix);
            listener.Start();
            message("Started HTTP server on " + prefix);
            Word.Application app = new Word.Application();
            app.Visible = false;

            // Setting these variables is comparable to passing null to the function.
            // This is necessary because the C# null cannot be passed by reference.
            object template = Missing.Value;
            object newTemplate = Missing.Value;
            object documentType = Missing.Value;
            object visible = true;


            while (true)
            {
                HttpListenerContext context = listener.GetContext();
                HttpListenerRequest req = context.Request;
                HttpListenerResponse res = context.Response;

                string response = "";
                if (!req.HttpMethod.Equals("POST"))
                {
                    message("Client should use POST request");
                    response += "{\"error\":\"Use POST method\"}";
                }
                else
                {
                    message("Processing request...");
                    Word._Document doc1 = app.Documents.Add(ref template, ref newTemplate, ref documentType, ref visible);
                    System.IO.StreamReader reader = new System.IO.StreamReader(req.InputStream, req.ContentEncoding);
                    string URLencodedContent = reader.ReadToEnd();
                    string postedContent = System.Web.HttpUtility.UrlDecode(URLencodedContent);
                    message(postedContent);
                    string targetText = postedContent.Substring(5);
                    object s = 0;
                    object e = 0;
                    Range wholeDocument = doc1.Range(ref s, ref e);
                    wholeDocument.Text = targetText;
                    ProofreadingErrors spellErrors = doc1.SpellingErrors;
                    ProofreadingErrors grammaticalErrors = doc1.GrammaticalErrors;

                    int errors = spellErrors.Count;
                    response += "{\"spellErrors\":[ ";
                    foreach (Range item in spellErrors)
                    {
                        int start = item.Start;
                        int end = item.End;
                        response += "[" + start + "," + end + "],";
                    }
                    response = response.Substring(0, response.Length-1) + "]";
                    response += ", \"grammaticalErrors\":[ ";
                    foreach (Range item in grammaticalErrors)
                    {
                        int start = item.Start;
                        int end = item.End;
                        response += "[" + start + "," + end + "],";
                    }
                    response = response.Substring(0, response.Length - 1) + "]";
                    response += "}";
                }

                res.Headers.Add("Content-type: application/json");
                byte[] responseBytes = Encoding.UTF8.GetBytes(response);
                res.OutputStream.Write(responseBytes, 0, responseBytes.Length);
                res.Close();
            }
        }

        static void message(string msg)
        {
            Console.WriteLine(msg);
        }
    }
}


Client side in Python 2.6.

from sys import argv
from sys import stdin
from urllib2 import urlopen
from urllib2 import URLError
from urllib import urlencode
from simplejson import loads as jsonloads
from re import search as regex_search
from re import sub as regex_sub

underlineMarker = "\033[4m";
resetMarker = "\033[0m";

def convertFileToJSON(file):
	s = file.read()
	return jsonloads(s)

def fetchJSON(url, txt):
	data = {"text" : txt}
	debug("Fetching data to %s..." % url)
	try :
		res = urlopen(url, urlencode(data))
	except  URLError:
		debug("URL Error occured")
	return convertFileToJSON(res);

def debug(m):
	print(m);



ignore_matches = ["sec:", "multihop"]

def showErrors(errors, text, error_kind = "spelling"):
	margin = 5
	if len(errors) > 0:
		for s, e in errors:
			word = text[s:e]
			ignore = False
			for m in ignore_matches:
				if (regex_search(m, word)):
					ignore = True
			if ignore:
				continue
			startIndex = s - margin if s - margin >= 0 else 0
			endIndex = e + margin if e + margin < len(text) else len(text)
			substr = "..." + text[startIndex:s]
			substr += underlineMarker + text[s:e] + resetMarker
			substr += text[e:endIndex] + "..."
			print "----------------------------------"
			print substr
		print "----------------------------------"
	else:
		debug("No %s errors" % error_kind);

def main():
	if len(argv) != 2:
		debug("specify URL")
		exit(1)

	txt = stdin.read()
	txt = regex_sub(r'\n', " ", txt);
	data = fetchJSON(argv[1], txt)
	print "Spelling Errors:"
	showErrors(data['spellErrors'], txt)
	print "Grammatical Errors:"
	showErrors(data['grammaticalErrors'], txt)


if __name__ == "__main__":
	main()

This client side script reads the text from its standard input. You can use for example

  1. copy sentences
  2. run pbpaste to send the copied text to the server

http://gyazo.com/43a3c00141cd3f78410b379991aefbbe.png


Although memory managements and error handling are insufficient, my long-cherished desire seems to come true.

*1:Note that descriptions for grammatical Errors are not implemented yet...