Xercesを使ったJavaでのXML解析エラーの解決方法

2013/12/19 19:41

このQ&Aのポイント

JavaのソースコードでXML文書を解析する際、XercesのDOMパーサを使用している際にエラーが発生しました
エラーの内容は「The entity name must immediately follow the '&' in the entity reference」というものでした
このエラーの解決策や、Javaのネットワークプログラミングに詳しい方に教えていただきたいです

Xercesを使ったjavaでのXML解析

DOMを使ってXML文書を解析するJavaのソースコードで、DOMパーサは、クラス org.apache.xerces.parsers.DOMParserで参照している下記のプログラムで、 [Fatal Error] :17:109: The entity name must immediately follow the '&' in the entity reference. org.xml.sax.SAXParseException; lineNumber: 17; columnNumber: 109; The entity name must immediately follow the '&' in the entity reference. のエラーが出てしまって、解決策が分かりかねています。Javaのネットワークプログラミングに詳しい方、御教示願えればと思います。 package nikkei; import java.io.ByteArrayInputStream; import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.NodeList; import org.xml.sax.InputSource; public class TwitterSearch { public static void main(String[] args) throws Exception { TwitterSearch search = new TwitterSearch(); search.search("日経ソフトウエア"); } public void search(String keyword) throws Exception { SearchAPIClient client = new SearchAPIClient(); String xml = client.execute(keyword); parse(xml); } private void parse(String xml) throws Exception { DOMParser parser = new DOMParser(); try { parser.parse(new InputSource(new ByteArrayInputStream(xml.getBytes()))); Document doc = parser.getDocument(); NodeList entries = doc.getElementsByTagName("entry"); for (int i = 0; i < entries.getLength(); i++) { String name = null; String tweet = null; Element entry = (Element) entries.item(i); NodeList titleList = entry.getElementsByTagName("title"); if (titleList.getLength() == 1) { tweet = titleList.item(0).getTextContent(); } NodeList authorList = entry.getElementsByTagName("author"); if (authorList.getLength() == 1) { Element author = (Element) authorList.item(0); NodeList nameList = author.getElementsByTagName("name"); if (nameList.getLength() == 1) { name = nameList.item(0).getTextContent(); } } System.out.println(name + "さんのツイート"); System.out.println("\t" + tweet); } } catch (Exception e) { e.printStackTrace(); } } } package nikkei; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.util.EntityUtils; public class SearchAPIClient { public String execute(String keyword) throws Exception { String url = "https://twitter.com/search?q=" + keyword; HttpClient httpClient = new DefaultHttpClient(); HttpGet httpGet = new HttpGet(url); HttpResponse response = httpClient.execute(httpGet); HttpEntity entity = response.getEntity(); if (entity != null) { return EntityUtils.toString(entity); } else { return null; } } } よろしくお願いいたします。

tmiyoshi
お礼率29% (39/133)

Java
回答数1
ありがとう数3

みんなの回答 （1）
専門家の回答

質問者が選んだベストアンサー

ベストアンサー

teketon
ベストアンサー率65% (141/215)

2013/12/19 21:51 回答No.1

結論から言ってしまうと、Javascriptの入ったHTMLはDOMパーサーでは解析できません。 &、<、>が入っているため、ValidなXMLではないためです。私だったら、下記のHTMLパーサを使用します。 http://jsoup.org/ -------以下サンプル package test; import java.net.URLEncoder; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; public class Test { public static void main(String[] args) throws Exception{ Document document = Jsoup.connect("https://twitter.com/search?q="+URLEncoder.encode("日経ソフトウェア","utf-8")).get(); System.out.println(document.getElementsByTag("title")); } }

質問者

お礼 2013/12/20 20:18

jsoupで作成したDocumentから Elements classes = document.select("[class]"); for (Element identifier : classes) { if(identifier.className().equals("fullname js-action-profile-name show-popup-with-id")) { System.out.println(identifier.text() + "さんのツイート"); } if(identifier.className().equals("js-tweet-text tweet-text")) { System.out.println("\t" + identifier.text()); } } とすることで質問のプログラムでやりたいことはできるようになりました。 TwitterのSearch APIは今年の３月のV1.0 -> V1.1の仕様変更でかなり使い方が変わってしまったようです。以前は、<author>や<name>のタグを使ってDOMの構文解析をすればできていたようですが、V1.1からはJavaScriptを使うようになってしまったためか、その仕様が全然変わってしまったみたいです。ありがとうございました。

Xercesを使ったJavaでのXML解析エラーの解決方法

Xercesを使ったjavaでのXML解析

質問者が選んだベストアンサー

お礼 2013/12/20 20:18

注目のQ&A

カテゴリ

あなたにピッタリな商品が見つかる！ OKWAVE セレクト

専門家に質問してみよう