摘要:本文介绍了如何使用DOM解析XML文档。
环境
Windows 10 企业版 LTSC 21H2 Java 1.8
1 定义 DOM(Document Object Model,文档对象模型)是W3C组织推荐的处理可扩展标记语言的标准编程接口。它并非局限于XML,也适用于HTML。
2 基本原理 DOM解析器的核心工作流程:
加载XML文档:读取文件或输入流,将其加载到内存中。
构建DOM树:按照XML文档的语法和层级关系,在内存中创建对应的节点对象,并组装成完整的文档对象树。此过程会消耗一定内存,树的大小与文档规模成正比。
操作DOM树:通过DOM提供的API遍历、查询、添加、修改或删除树中的节点,实现对数据的操作。
保存修改:若对DOM树进行了修改,可通过Transformer等工具将修改后的DOM树重新写入XML文件。
关键特性:
一次加载:整个XML文档被完整读入内存。
树形结构:在内存中构建层次化的节点树。
随机访问:可以任意访问和操作树中的任何节点。
DOM将XML文档中的所有内容都视为节点,由Node接口统一管理,常见的节点类型:
文档节点(Document Node):整个文档的根节点,代表整个文档本身,作为DOM树的入口。
元素节点(Element Node):XML文档中最常见的节点类型,代表XML文档中的元素,属于核心结构单元。
文本节点(Text Node):元素的内容,用于存储具体数据值。
属性节点(Attribute Node):元素的属性,以键值对的形式存在,属于元素节点的附属节点,不参与DOM树的层级结构。
CDATA节点(CDATA Section Node):XML文档中特殊的文本节点,用于存储无需解析的特殊文本。
注释节点(Comment Node):XML文档中的注释,用于标注XML结构或说明数据。
文档类型节点(Document Type Node):XML文档的声明,用于指定XML的文档类型定义。
3 文档示例 以一个简单的XML文档为例:
school.xml 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 <?xml version="1.0" encoding="UTF-8" ?> <school address ="北京市海淀区" > <name > 阳光小学</name > <teachers > <teacher id ="teacher_1" > <name > 李明</name > <subject > 语文</subject > </teacher > <teacher id ="teacher_2" > <name > 赵强</name > <subject > 数学</subject > </teacher > </teachers > <students > <student id ="student_1" > <name > 张婷</name > <gender > 女</gender > <age > 13</age > <hobbies > <hobby > 画画</hobby > <hobby > 弹琴</hobby > </hobbies > </student > <student id ="student_2" > <name > 王浩</name > <gender > 男</gender > <age > 14</age > <hobbies > <hobby > 跑步</hobby > <hobby > 游泳</hobby > </hobbies > </student > </students > </school >
对应的DOM树结构:
code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Document ├── XmlDeclaration (version="1.0", encoding="UTF-8") └── Element: school ├── Attr: address="北京市海淀区" ├── Element: name │ └── Text: "阳光小学" ├── Element: teachers │ ├── Element: teacher │ │ ├── Attr: id="teacher_1" │ │ ├── Element: name │ │ │ └── Text: "李明" │ │ └── Element: subject │ │ └── Text: "语文" │ └── Element: teacher │ ├── Attr: id="teacher_2" │ ├── Element: name │ │ └── Text: "赵强" │ └── Element: subject │ └── Text: "数学" └── Element: students ├── Element: student │ ├── Attr: id="student_1" │ ├── Element: name │ │ └── Text: "张婷" │ ├── Element: gender │ │ └── Text: "女" │ ├── Element: age │ │ └── Text: "13" │ └── Element: hobbies │ ├── Element: hobby │ │ └── Text: "画画" │ └── Element: hobby │ └── Text: "弹琴" └── Element: student ├── Attr: id="student_2" ├── Element: name │ └── Text: "王浩" ├── Element: gender │ └── Text: "男" ├── Element: age │ └── Text: "14" └── Element: hobbies ├── Element: hobby │ └── Text: "跑步" └── Element: hobby └── Text: "游泳"
4 核心组件 4.1 工厂类 4.1.1 DocumentBuilderFactory 常用方法:
java 1 2 3 4 5 6 public static DocumentBuilderFactory newInstance () ;public abstract DocumentBuilder newDocumentBuilder () ;public void setNamespaceAware (boolean awareness) ;
4.1.2 DocumentBuilder 常用方法:
java 1 2 3 4 5 6 7 8 9 10 public Document parse (String uri) ;public Document parse (File f) ;public Document parse (InputStream is) ;public Document parse (InputSource is) ;public abstract Document newDocument () ;
4.2 常用类 4.2.1 Document 常用方法:
java 1 2 3 4 5 6 7 8 9 10 11 12 public Element getDocumentElement () ;public Element createElement (String tagName) ;public Element getElementById (String id) ;public NodeList getElementsByTagName (String name) ;public Text createTextNode (String data) ;public Attr createAttribute (String name) ;
4.2.2 Node 常用方法:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 public String getNodeName () ;public String getNodeValue () ;public String setNodeValue (String nodeValue) ;public NamedNodeMap getAttributes () ;public String getTextContent () ;public void setTextContent (String text) ;public Node getParentNode () ;public NodeList getChildNodes () ;public Node getFirstChild () ;public Node getLastChild () ;public Node getPreviousSibling () ;public Node getNextSibling () ;public Node appendChild (Node newChild) ;public Node insertBefore (Node newChild, Node refChild) ;public Node removeChild (Node oldChild) ;public Node replaceChild (Node newChild, Node oldChild) ;
4.2.3 Element 常用方法:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 public String getTagName () ;public String getAttribute (String name) ;public void setAttribute (String name, String value) ;public void removeAttribute (String name) ;public Attr getAttributeNode (String name) ;public Attr setAttributeNode (Attr newAttr) ;public Attr removeAttributeNode (Attr oldAttr) ;
5 实际应用 5.1 基础使用 主方法:
java 1 2 3 4 5 6 7 8 9 public static void main (String[] args) { DOM dom = new DOM (); Document document = dom.parse("src/main/resources/school.xml" ); Element student = (Element) document.getElementsByTagName("student" ).item(0 ); dom.traverse(student, 0 ); }
创建工厂并解析文档:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 public Document parse (String filePath) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); return builder.parse(new File (filePath)); } catch (Exception e) { e.printStackTrace(); return null ; } }
遍历节点:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 public void traverse (Node node, int depth) { String indent = String.join("" , Collections.nCopies(depth, " " )); if (node.getNodeType() == Node.TEXT_NODE && node.getNodeValue().trim().isEmpty()) { return ; } System.out.println(indent + node.getNodeName() + ": " + Optional.ofNullable(node.getNodeValue()).orElse("" )); NodeList childNodes = node.getChildNodes(); for (int i = 0 ; i < childNodes.getLength(); i++) { traverse(childNodes.item(i), depth + 1 ); } }
查看节点:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 public void select (Document document) { NodeList studentList = document.getElementsByTagName("student" ); System.out.println("找到 " + studentList.getLength() + " 个学生" ); for (int i = 0 ; i < studentList.getLength(); i++) { Element student = (Element) studentList.item(i); String id = student.getAttribute("id" ); String name = student.getElementsByTagName("name" ).item(0 ).getTextContent(); String gender = student.getElementsByTagName("gender" ).item(0 ).getTextContent(); System.out.println("编号:" + id + ", 姓名: " + name + ", 性别: " + gender); } }
修改节点:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 public void update (Document document) { Element student = (Element) document.getElementsByTagName("student" ).item(0 ); student.getElementsByTagName("age" ).item(0 ).setTextContent("18" ); Element newHobby = document.createElement("hobby" ); newHobby.setTextContent("唱歌" ); student.getElementsByTagName("hobbies" ).item(0 ).appendChild(newHobby); Element hobby = (Element) student.getElementsByTagName("hobby" ).item(0 ); student.getElementsByTagName("hobbies" ).item(0 ).removeChild(hobby); }
5.2 高级特性 5.2.1 使用XPath查询 示例:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public void selectUseXPath (Document document) { try { XPathFactory xpathFactory = XPathFactory.newInstance(); XPath xpath = xpathFactory.newXPath(); String expression = "/school//student[age > 13]/name" ; NodeList result = (NodeList) xpath.evaluate(expression, document, XPathConstants.NODESET); for (int i = 0 ; i < result.getLength(); i++) { System.out.println("符合条件的学生姓名: " + result.item(i).getTextContent()); } } catch (Exception e) { e.printStackTrace(); } }
5.2.2 使用XSLT保存 示例:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public void save (Document document, String filePath) { try { TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes" ); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8" ); transformer.setOutputProperty(OutputKeys.VERSION, "1.0" ); transformer.setOutputProperty(OutputKeys.INDENT, "yes" ); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount" , "2" ); DOMSource source = new DOMSource (document); StreamResult result = new StreamResult (new File (filePath)); transformer.transform(source, result); } catch (Exception e) { e.printStackTrace(); } }
5.2.3 使用命名空间 给根元素设置命名空间:
java 1 2 3 4 5 6 7 8 9 10 public void setNamespace (Document document, String namespace) { try { Element school = document.getDocumentElement(); school.setAttributeNS("http://www.w3.org/2000/xmlns/" , "xmlns" , namespace); } catch (Exception e) { e.printStackTrace(); } }
使用命名空间读取文档:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public Document parseWithNamespace (String filePath) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true ); DocumentBuilder builder = factory.newDocumentBuilder(); return builder.parse(new File (filePath)); } catch (Exception e) { e.printStackTrace(); return null ; } }
使用命名空间查看节点:
java 1 2 3 4 5 6 7 8 9 10 11 12 13 public void selectWithNamespace (Document document, String namespace) { NodeList studentList = document.getElementsByTagNameNS(namespace, "student" ); System.out.println("找到 " + studentList.getLength() + " 个学生" ); for (int i = 0 ; i < studentList.getLength(); i++) { Element student = (Element) studentList.item(i); String id = student.getAttribute("id" ); String name = student.getElementsByTagNameNS(namespace, "name" ).item(0 ).getTextContent(); String gender = student.getElementsByTagNameNS(namespace, "gender" ).item(0 ).getTextContent(); System.out.println("编号:" + id + ", 姓名: " + name + ", 性别: " + gender); } }
条