Friday, 26 December 2014

simple sax parser in java

The XML DocumentHandler interface specifies a number of “callbacks” that your code
must provide. In one sense, this is similar to the Listener interfaces in AWT and
Swing, as covered briefly in Recipe 14.4. The most commonly used methods are
startElement() , endElement() , and characters() . The first two, obviously, are called
at the start and end of an element, and characters() is called when there is charac-
ter data. The characters are stored in a large array, and you are passed the base of the
array and the offset and length of the characters that make up your text. Conve-
niently, there is a string constructor that takes exactly these arguments. Hmmm, I
wonder if they thought of that....
To demonstrate this, I wrote a simple program using SAX to extract names and email
addresses from an XML file. The program itself is reasonably simple and is shown in
import java.io.IOException;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import com.darwinsys.util.Debug;
/**
* Simple lister - extract name and children tags from a user file. Version for SAX 2.0
* @version $Id: ch21,v 1.5 2004/05/04 20:13:38 ian Exp $
*/
public class SAXLister {
public static void main(String[] args) throws Exception {
new SAXLister(args);
}
public SAXLister(String[] args) throws SAXException, IOException {
XMLReader parser = XMLReaderFactory
.createXMLReader("org.apache.xerces.parsers.SAXParser");
// should load properties rather than hardcoding class name
parser.setContentHandler(new PeopleHandler());
parser.parse(args.length == 1 ? args[0] : "parents.xml");
}
/** Inner class provides DocumentHandler
*/
class PeopleHandler extends DefaultHandler {
boolean parent = false;
boolean kids = false;
public void startElement(String nsURI, String localName,
String rawName, Attributes attributes) throws SAXException {
Debug.println("docEvents", "startElement: " + localName + ","
+ rawName);
// Consult rawName since we aren't using xmlns prefixes here.
if (rawName.equalsIgnoreCase("name"))
parent = true;
if (rawName.equalsIgnoreCase("children"))
kids = true;
}
public void characters(char[] ch, int start, int length) {
if (parent) {
System.out.println("Parent: " + new String(ch, start, length));
parent = false;
} else if (kids) {
System.out.println("Children: " + new String(ch, start, length));
kids = false;
}
}
/** Needed for parent constructor */
public PeopleHandler() throws org.xml.sax.SAXException {
super();
}
}
}
$ java -classpath .:../jars/darwinsys.jar:../jars/xerces.jar SAXLister people.xml
Parent: Ian Darwin
Parent: Another Darwin
$

Email client in java

A real email client allows the user considerably more control. Of course, it also
requires more work. In this recipe, I’ll build a simple version of a mail sender, relying
upon the JavaMail standard extension in package javax.mail and javax.mail.
internet (the latter contains classes that are specific to Internet email protocols).
This first example shows the steps of sending mail over SMTP, the standard Internet
mail protocol. The steps are listed in the sidebar.
As usual in Java, you must catch certain exceptions. This API requires that you catch
the MessagingException , which indicates some failure of the transmission. Class
import java.io.*;
import java.util.*;
import javax.mail.*;
import javax.mail.internet.*;
/** sender -- send an email message.
*/
public class Sender {
/** The message recipient. */
protected String message_recip = "spam-magnet@darwinsys.com";
/* What's it all about, Alfie? */
protected String message_subject = "Re: your mail";
/** The message CC recipient. */
protected String message_cc = "nobody@erewhon.com";
/** The message body */
protected String message_body =
"I am unable to attend to your message, as I am busy sunning " +
"myself on the beach in Maui, where it is warm and peaceful. " +
"Perhaps when I return I'll get around to reading your mail. " +
"Or perhaps not.";
/** The JavaMail session object */
protected Session session;
/** The JavaMail message object */
protected Message mesg;
/** Do the work: send the mail to the SMTP server. */
public void doSend() {
// We need to pass info to the mail server as a Properties, since
// JavaMail (wisely) allows room for LOTS of properties...
Properties props = new Properties();
// Your LAN must define the local SMTP server as "mailhost"
// for this simple-minded version to be able to send mail...
props.put("mail.smtp.host", "mailhost");
// Create the Session object
session = Session.getDefaultInstance(props, null);
session.setDebug(true); // Verbose!
try {
// create a message
mesg = new MimeMessage(session);
// From Address - this should come from a Properties...
mesg.setFrom(new InternetAddress("nobody@host.domain"));
// TO Address
InternetAddress toAddress = new InternetAddress(message_recip);
mesg.addRecipient(Message.RecipientType.TO, toAddress);
// CC Address
InternetAddress ccAddress = new InternetAddress(message_cc);
mesg.addRecipient(Message.RecipientType.CC, ccAddress);
// The Subject
mesg.setSubject(message_subject);
// Now the message body.
mesg.setText(message_body);
// TODO I18N: use setText(msgText.getText(), charset)
// Finally, send the message!
Transport.send(mesg);
} catch (MessagingException ex) {
while ((ex = (MessagingException)ex.getNextException()) != null) {
ex.printStackTrace();
}
}
}
/** Simple test case driver */
public static void main(String[] av) {
Sender sm = new Sender();
sm.doSend();
}
}
Of course, a program that can only send one message to one address is not useful in
the long run. The second version (not shown here, but in the source tree accompany-
ing this book) allows the To, From, Mailhost, and Subject to come from the com-
mand line and reads the mail text either from a file or from the standard input.

convert unicode and string in java

Since both Java char values and Unicode characters are 16 bits in width, a char can
hold any Unicode character. The charAt() method of String returns a Unicode char-
acter. The StringBuilder append() method has a form that accepts a char . Since char
is an integer type, you can even do arithmetic on char s, though this is not necessary
as frequently as in, say, C. Nor is it often recommended, since the Character class
provides the methods for which these operations were normally used in languages
such as C. Here is a program that uses arithmetic on char s to control a loop, and also
appends the characters into a StringBuilder (see Recipe 3.3):
/**
* Conversion between Unicode characters and Strings
*/
public class UnicodeChars {
public static void main(String[] argv) {
StringBuffer b = new StringBuffer();
for (char c = 'a'; c<'d'; c++) {
b.append(c);
}
b.append('\u00a5'); // Japanese Yen symbol
b.append('\u01FC'); // Roman AE with acute accent
b.append('\u0391'); // GREEK Capital Alpha
b.append('\u03A9'); // GREEK Capital Omega
for (int i=0; i<b.length(); i++) {
System.out.println("Character #" + i + " is " + b.charAt(i));
}
System.out.println("Accumulated characters are " + b);
}
}
When you run it, the expected results are printed for the ASCII characters. On my
Unix system, the default fonts don’t include all the additional characters, so they are
either omitted or mapped to irregular characters (Recipe 13.3 shows how to draw
text in other fonts):
C:\javasrc\strings>java UnicodeChars
Character #0 is a
Character #1 is b
Character #2 is c
Character #3 is %
Character #4 is |
Character #5 is
Character #6 is )
Accumulated characters are abc%|)
My Windows system doesn’t have most of those characters either, but at least it
prints the ones it knows are lacking as question marks (Windows system fonts are
more homogenous than those of the various Unix systems, so it is easier to know
what won’t work). On the other hand, it tries to print the Yen sign as a Spanish capi-tal Enye (N with a ~ over it). Amusingly, if I capture the console log under Windows into a file and display it under Unix, the Yen symbol now appears:
Character #0 is a
Character #1 is b
Character #2 is c
Character #3 is ¥
Character #4 is ?
Character #5 is ?
Character #6 is ?
Accumulated characters are abc¥???