摘要:本文介绍了如何使用StAX解析XML文档。
环境
Windows 10 企业版 LTSC 21H2
Java 1.8
1 定义
StAX(Streaming API for XML)是一种基于流式处理的XML解析API。StAX采用拉模式解析数据,允许应用程序主动控制解析过程,在需要时获取下一个解析事件。
2 基本原理
StAX解析器的核心工作流程:
- 创建解析器:通过XMLInputFactory工厂创建XMLStreamReader实例或XMLEventReader实例。
- 主动拉取:应用程序主动调用获取下一个解析事件。
- 处理事件:根据事件类型进行相应的数据处理。
- 状态控制:应用程序完全控制解析进度,可以随时停止或跳过内容。
关键特性:
- 拉模式解析:应用程序控制解析流程。
- 双向访问:支持读取和写入。
- 内存高效:不需要将整个文档加载到内存。
- 灵活性高:可以随时停止或跳过内容。
StAX提供两种编程模型:
- 基于指针的模型(XMLStreamReader)通过next()方法获取事件,进而获取事件类型和具体内容,性能更高,使用更底层。
- 基于迭代器的模型(XMLEventReader)通过nextEvent()方法获取事件对象,进而获取事件信息,更面向对象,使用更简单。
3 文档示例
以一个简单的XML文档为例:
school.xml1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| <?xml version="1.0" encoding="UTF-8"?> <school address="北京市海淀区"> <name>阳光小学</name> <teachers> <teacher id="teacher_1"> <name>李明</name> <subject>语文</subject> </teacher> <teacher id="teacher_2"> <name>赵强</name> <subject>数学</subject> </teacher> </teachers> <students> <student id="student_1"> <name>张婷</name> <gender>女</gender> <age>13</age> <hobbies> <hobby>画画</hobby> <hobby>弹琴</hobby> </hobbies> </student> <student id="student_2"> <name>王浩</name> <gender>男</gender> <age>14</age> <hobbies> <hobby>跑步</hobby> <hobby>游泳</hobby> </hobbies> </student> </students> </school>
|
4 核心组件
4.1 工厂类
常用方法:
java1 2 3 4 5 6 7 8 9 10 11 12
| public static XMLInputFactory newInstance();
public abstract XMLStreamReader createXMLStreamReader(InputStream stream); public abstract XMLStreamReader createXMLStreamReader(Reader reader);
public abstract XMLEventReader createXMLEventReader(InputStream stream); public abstract XMLEventReader createXMLEventReader(Reader reader);
public abstract Object getProperty(String name);
public abstract void setProperty(String name, Object value);
|
4.1.2 XMLOutputFactory
常用方法:
java1 2 3 4 5
| public static XMLOutputFactory newInstance();
public XMLStreamWriter createXMLStreamWriter(OutputStream stream); public XMLStreamWriter createXMLStreamWriter(Writer writer);
|
4.2 读取器类
4.2.1 XMLStreamReader
常用方法:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| public int next();
public boolean hasNext();
public int getEventType();
public QName getName();
public String getLocalName();
public String getText();
public int getAttributeCount();
public QName getAttributeName(int index);
public String getAttributeLocalName(int index);
public String getAttributeValue(int index);
public int getNamespaceCount();
public String getNamespaceURI(int index);
|
事件类型常量:
java1 2 3 4 5 6 7 8 9 10
| XMLStreamReader.START_ELEMENT
XMLStreamReader.END_ELEMENT
XMLStreamReader.CHARACTERS
XMLStreamReader.START_DOCUMENT
XMLStreamReader.END_DOCUMENT
|
4.2.2 XMLEventReader
常用方法:
java1 2 3 4 5 6
| public XMLEvent nextEvent();
public boolean hasNext();
public XMLEvent peek();
|
4.2.3 XMLEvent
常用方法:
java1 2 3 4 5 6 7 8 9 10 11 12
| public int getEventType();
public boolean isStartElement();
public boolean isEndElement();
public boolean isCharacters();
public boolean isStartDocument();
public boolean isEndDocument();
|
4.3 写入器类
4.3.1 XMLStreamWriter
常用方法:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| public void writeStartDocument();
public void writeEndDocument();
public void writeStartElement(String localName);
public void writeEndElement();
public void writeEmptyElement(String localName);
public void writeCharacters(String text);
public void writeAttribute(String localName, String value);
public void writeNamespace(String prefix, String namespaceURI);
|
5 实际应用
5.1 基础使用
5.1.1 使用XMLStreamReader解析
创建解析器并解析文档:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
| public List<Student> parseWithStreamReader(String filePath) { List<Student> students = new ArrayList<>(); XMLStreamReader reader = null; try { XMLInputFactory factory = XMLInputFactory.newInstance(); reader = factory.createXMLStreamReader(new FileInputStream(filePath)); Student currentStudent = null; String currentElement = null; StringBuilder currentText = new StringBuilder(); if (reader.getEventType() == XMLStreamReader.START_DOCUMENT) { System.out.println("解析文档开始"); } while (reader.hasNext()) { int eventType = reader.next(); switch (eventType) { case XMLStreamReader.START_ELEMENT: currentElement = reader.getLocalName(); if ("student".equals(currentElement)) { currentStudent = new Student(); String id = reader.getAttributeValue(null, "id"); currentStudent.setId(id); } break; case XMLStreamReader.CHARACTERS: if (!reader.isWhiteSpace()) { currentText.append(reader.getText()); } break; case XMLStreamReader.END_ELEMENT: String elementName = reader.getLocalName(); String text = currentText.toString().trim(); if (currentStudent != null) { switch (elementName) { case "name": currentStudent.setName(text); break; case "gender": currentStudent.setGender(text); break; case "age": if (!text.isEmpty()) { currentStudent.setAge(Integer.parseInt(text)); } break; case "hobby": currentStudent.addHobby(text); break; case "student": students.add(currentStudent); currentStudent = null; break; } } currentText.setLength(0); break; } } if (reader.getEventType() == XMLStreamReader.END_DOCUMENT) { System.out.println("解析文档结束"); } } catch (Exception e) { e.printStackTrace(); return Collections.emptyList(); } finally { if (reader != null) { try { reader.close(); } catch (XMLStreamException e) { e.printStackTrace(); } } } return students; }
|
5.1.2 使用XMLEventReader解析
创建解析器并解析文档:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
| public List<Student> parseWithEventReader(String filePath) { List<Student> students = new ArrayList<>(); XMLEventReader reader = null; try { XMLInputFactory factory = XMLInputFactory.newInstance(); reader = factory.createXMLEventReader(new FileInputStream(filePath)); Student currentStudent = null; String currentElement = null; while (reader.hasNext()) { XMLEvent event = reader.nextEvent(); if (event.isStartDocument()) { System.out.println("解析文档开始"); } if (event.isStartElement()) { StartElement startElement = event.asStartElement(); currentElement = startElement.getName().getLocalPart(); if ("student".equals(currentElement)) { currentStudent = new Student(); Attribute idAttr = startElement.getAttributeByName(new QName("id")); if (idAttr != null) { currentStudent.setId(idAttr.getValue()); } } } if (event.isCharacters()) { Characters characters = event.asCharacters(); String text = characters.getData().trim(); if (!text.isEmpty() && currentStudent != null) { switch (currentElement) { case "name": currentStudent.setName(text); break; case "gender": currentStudent.setGender(text); break; case "age": currentStudent.setAge(Integer.parseInt(text)); break; case "hobby": currentStudent.addHobby(text); break; } } } if (event.isEndElement()) { EndElement endElement = event.asEndElement(); String elementName = endElement.getName().getLocalPart(); if ("student".equals(elementName)) { students.add(currentStudent); currentStudent = null; } currentElement = null; } if (event.isEndDocument()) { System.out.println("解析文档结束"); } } } catch (Exception e) { e.printStackTrace(); return Collections.emptyList(); } finally { if (reader != null) { try { reader.close(); } catch (XMLStreamException e) { e.printStackTrace(); } } } return students; }
|
5.1.3 使用XMLStreamWriter写入
写入文档:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
| public void writeWithStreamWriter(List<Student> students, String filePath) { XMLStreamWriter writer = null; try { XMLOutputFactory factory = XMLOutputFactory.newInstance(); writer = factory.createXMLStreamWriter(new FileOutputStream(filePath), "UTF-8"); writer.writeStartDocument("UTF-8", "1.0"); writer.writeCharacters("\n"); writer.writeStartElement("school"); writer.writeAttribute("address", "北京市海淀区"); writer.writeCharacters("\n "); writer.writeStartElement("name"); writer.writeCharacters("阳光小学"); writer.writeEndElement(); writer.writeCharacters("\n "); writer.writeStartElement("students"); writer.writeCharacters("\n"); for (Student student : students) { writer.writeCharacters(" "); writer.writeStartElement("student"); writer.writeAttribute("id", student.getId()); writer.writeCharacters("\n"); writeElement(writer, "name", student.getName(), " "); writeElement(writer, "gender", student.getGender(), " "); writeElement(writer, "age", String.valueOf(student.getAge()), " "); writer.writeCharacters(" "); writer.writeStartElement("hobbies"); writer.writeCharacters("\n"); for (String hobby : student.getHobbies()) { writeElement(writer, "hobby", hobby, " "); } writer.writeCharacters(" "); writer.writeEndElement(); writer.writeCharacters("\n"); writer.writeCharacters(" "); writer.writeEndElement(); writer.writeCharacters("\n"); } writer.writeCharacters(" "); writer.writeEndElement(); writer.writeCharacters("\n"); writer.writeEndElement(); writer.writeEndDocument(); } catch (Exception e) { e.printStackTrace(); } finally { if (writer != null) { try { writer.close(); } catch (XMLStreamException e) { e.printStackTrace(); } } } }
|
创建元素:
java1 2 3 4 5 6 7
| private void writeElement(XMLStreamWriter writer, String elementName, String value, String indent) throws XMLStreamException { writer.writeCharacters(indent); writer.writeStartElement(elementName); writer.writeCharacters(value); writer.writeEndElement(); writer.writeCharacters("\n"); }
|
5.2 高级特性
5.2.1 流式处理文件
批量处理文件信息:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| public void processStudentList(String filePath) { XMLStreamReader reader = null; try { XMLInputFactory factory = XMLInputFactory.newInstance(); reader = factory.createXMLStreamReader(new FileInputStream(filePath)); int studentCount = 0; int batchSize = 100; while (reader.hasNext()) { int eventType = reader.next(); if (eventType == XMLStreamReader.START_ELEMENT) { if ("student".equals(reader.getLocalName())) { if (studentCount % batchSize == 0) { System.out.println("已处理 " + studentCount + " 个学生"); } } } } System.out.println("总共处理 " + studentCount + " 个学生"); } catch (Exception e) { e.printStackTrace(); } finally { if (reader != null) { try { reader.close(); } catch (XMLStreamException e) { e.printStackTrace(); } } }
|
5.2.2 使用命名空间
使用命名空间:
java1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| public void parseWithNamespaces(String filePath) { XMLStreamReader reader = null; try { XMLInputFactory factory = XMLInputFactory.newInstance(); factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true); reader = factory.createXMLStreamReader(new FileInputStream(filePath)); while (reader.hasNext()) { int eventType = reader.next(); if (eventType == XMLStreamReader.START_ELEMENT) { String localName = reader.getLocalName(); String namespaceURI = reader.getNamespaceURI(); String prefix = reader.getPrefix(); System.out.println("元素: " + localName + ", 命名空间: " + namespaceURI + ", 前缀: " + prefix); int namespaceCount = reader.getNamespaceCount(); for (int i = 0; i < namespaceCount; i++) { String nsPrefix = reader.getNamespacePrefix(i); String nsURI = reader.getNamespaceURI(i); System.out.println("命名空间声明: " + nsPrefix + "=" + nsURI); } } } } catch (Exception e) { e.printStackTrace(); } finally { if (reader != null) { try { reader.close(); } catch (XMLStreamException e) { e.printStackTrace(); } } } }
|
6 对比
三种解析方式对比:
| 特性 |
DOM |
SAX |
StAX |
| 解析模式 |
树形结构 |
事件驱动(推模式) |
流式处理(拉模式) |
| 内存使用 |
高(整个文档加载到内存) |
低(不需要存储整个文档) |
低(不需要存储整个文档) |
| 性能 |
初始解析慢,但后续访问快 |
解析快,适合大型文件 |
解析快,性能接近SAX |
| 访问方式 |
随机访问 |
顺序访问(只能向前) |
顺序访问(可控制流程) |
| 控制权 |
应用程序 |
解析器 |
应用程序 |
| 写入支持 |
完整支持 |
只读,写入困难 |
支持读写 |
| 易用性 |
简单直观 |
复杂,需要状态管理 |
中等,比SAX简单 |
| 适用场景 |
小型文档、需要修改 |
大型文档、只读操作 |
大型文档、需要控制流程 |
条