超大数据量的xlsx格式的excel文件的读取和解析,解决了POI方式的内存溢出和性能问题
在02處直接就會內存溢出了。無論怎么搶救都沒用,看來得要換一條路走走了。
在不停的Google查資料折騰了兩天之后,在POI官網成功的找到了解決方案。此處請允許我稍微吐槽一下POI,有瓶頸問題的解決方案卻隱藏的很深,只有一個不起眼的小鏈接,難道是怕大家都知道有點丟臉?
總結一下該方案的主要原理:超大數據量的excel文件通過頁面上傳后(nginx的默認最大文件字節要調大一些:client_max_body_size ?xxxm),后臺接收到該文件的對象CommonsMultipartFile。首先獲取該文件對象的inputStream,然后用OPCPackage來打開該文件流,將excel在內存中轉化為xml格式卻不會出現內存溢出的問題,根據該xml的標簽就可以識別是格式,標題還是內容。然后在內存中通過POI框架中的XSSFSheetXMLHandler類中的sheetContentsHandler接口來根據上述的標簽來解析內容。可以將解析到的內容存入list或者map容器中便于后續的業務處理(純內容數據,幾百萬的數據量輕松勝任,親測不會內存溢出)。當然根據業務需要,需要復寫sheetContentsHandler接口的startRow,endRow,cell,headerFooter四大方法。
當然了,筆者也親測了另一種方案:就是用OPCPackage來打開該文件流并且將excel在內存中轉化為xml格式之后,一股腦兒的用緩沖流分批的將所有原excel內容數據寫入到本地的txt文件,再去分批的readLine該文件中的數據,進行業務處理。該方案的好處是交易導入的文件可以物理的落地,作為后期查驗的依據和憑證。當然了,缺點是多了一次物理轉儲的過程,耗時會延長。如何選擇看個人的業務需求咯。
本文中重點講述第一種方案吧,話不多說,貼代碼:
/*** @return* @throws Exception* @author jason.gao* 功能描述:交易導入*/@RequestMapping(value = "/transDetail/upload", method = {RequestMethod.POST, RequestMethod.GET})@RequestGuard(perm = "transaction.import.upload")public ResponseEntity<ResponseEnvelope<RestApiResp>> uploadFile(@RequestParam("file") CommonsMultipartFile[] files, HttpServletRequest req, HttpServletResponse resp) throws IOException {logger.info("uploadFile == >upload button start; fileName:[{}], CommonsMultipartFile[]:[{}]", files[0].getFileItem().getName(), files);long start = System.currentTimeMillis();String result = "完成交易文件的導入!";if (null == files || files.length != 1) {return RestApiResp.getSuccResponseEntity("必須上傳一個文件", null);}//重置buffer,在可能會超時的地方輸出resp的字節,避免前端ajax請求斷開!resp.setBufferSize(1);ServletOutputStream out = resp.getOutputStream();XlsxProcessAbstract xlsxProcess = new XlsxProcessAbstract();long getFileAndDataTime;ProcessTransDetailDataDto data;try {//獲取明細行累積的支付/退款的總金額/總筆數等數據的DTOdata = xlsxProcess.processAllSheet(files[0]);logger.info("匯總行的數據:[{}]", data.dtoToString());//獲取匯總行和明細行數據(包含標題)List<String> contentList = data.contentList;logger.info("明細行的數據條數為:[{}]", JSON.toJSONString(contentList.size() - 3));getFileAndDataTime = System.currentTimeMillis();logger.info("獲取文件并得到數據完成。耗時:[{}]秒", (getFileAndDataTime - start)/1000);//校驗匯總行數據正確性checkDetailSummary(contentList, data, out);logger.info("匯總行數據正確性的校驗已通過!");//分批調用OSP插入過程String handleResult = doOspHandle(contentList, data, out);if (!handleResult.equals(TransImportJobStatus.Success.getValue())) {result = TransImportJobStatus.getDescByKey(handleResult);logger.error(result);}} catch (CellDataException e) {logger.error("CellDataException: Error:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (OspException e) {logger.error("OspException:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (IOException e) {logger.error("IOException:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (Exception e) {logger.error("未知異常:[{}]", e);return RestApiResp.getSuccResponseEntity("未知異常,請排查日志:" + e.getMessage(), null);}long finishCheckAndInsertTime = System.currentTimeMillis();logger.info("完成數據校驗和數據分批插入。耗時:[{}]秒", (finishCheckAndInsertTime - getFileAndDataTime)/1000);logger.info("[{}],整個后臺處理過程共耗時:[{}]秒", result, (finishCheckAndInsertTime - start)/1000);return RestApiResp.getSuccResponseEntity(result, HttpStatus.OK);}上面代碼塊是整個后臺的主流程,注意的是要充分的捕捉異常,將異常信息呈獻給前端頁面和日志系統,便于生產故障時排查問題。
接下來的四個代碼塊是對excel中字段的業務處理,屬于業務部分,不關心業務的可以忽略這些代碼片段。
/*** 校驗匯總行所有必填項*/public void checkHeadNotEmpty(ProcessTransDetailDataDto dataDto) throws CellDataException{if (dataDto.getMerchantId()==null || dataDto.getMerchantId().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(1).getDesc());}if (dataDto.getMerchantName()==null || dataDto.getMerchantName().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(2).getDesc());}if (dataDto.getSettleOrderNo()==null || dataDto.getSettleOrderNo().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(3).getDesc());}if (dataDto.getTotalPaymentCount() == null || dataDto.getTotalPaymentCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(4).getDesc());}if (dataDto.getTotalPaymentAmount()==null || dataDto.getTotalPaymentAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(5).getDesc());}if (dataDto.getTotalRefundCount()==null || dataDto.getTotalRefundCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(6).getDesc());}if (dataDto.getTotalRefundAmount()==null || dataDto.getTotalRefundAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(7).getDesc());}if (dataDto.getNetTotalCount()==null || dataDto.getNetTotalCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(8).getDesc());}if (dataDto.getNetTotalAmount()==null || dataDto.getNetTotalAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(9).getDesc());}}
package com.vip.vpal.mgr.controller;import com.vip.vpal.mgr.enums.IdTypeCustomerTypeEnums; import com.vip.vpal.mgr.enums.TransImpFileExceptEnums; import com.vip.vpal.mgr.enums.TransTypeEnums; import com.vip.vpal.mgr.exception.CellDataException; import org.springframework.beans.factory.annotation.Autowired;/*** Created by jason.gao on 2017/8/11 0011.*/ public class XssfCellValueCheckHelper {public static String getStringNotEmpty(String cellValue, int row, int col) throws CellDataException {if (cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row+1, col +1).getDesc());}return cellValue;}public static String getRealOrDefaultValue(String cellValue, String defVlaue) {if (cellValue.equals("")) {return defVlaue;}return cellValue;}/*** 校驗原支付訂單號在退款時必填* 返回原支付訂單號*/public static String checkAndGetOriginOrderNo(String cellValue, String transType, int row) throws CellDataException {if (transType.equals(TransTypeEnums.Refund.getValue()) && cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row +1, 4).getDesc());}return cellValue; //原支付訂單號}/*** 校驗證件類型和客戶類型的關系一致性* 返回客戶類型的值*/public static String checkAndGetCustomerType(String[] cellRow, int row, int cell) throws CellDataException {//校驗證件類型是否正確String idType = cellRow[10];String customerTypeByDict = IdTypeCustomerTypeEnums.getDesc(idType);if (customerTypeByDict.equals("")) {throw new CellDataException(TransImpFileExceptEnums.DetailDateError.getValue(), TransImpFileExceptEnums.DetailDateError.setParams(row + 1, 11).getDesc());}//校驗客戶類型是否與證件類型相匹配String cusTomerType = cellRow[8]; //客戶類型,可不填,默認為證件類型對應的if (cusTomerType.equals("")) {return customerTypeByDict;}if (!customerTypeByDict.equals(cusTomerType)){throw new CellDataException(TransImpFileExceptEnums.DetailDateError.getValue(), TransImpFileExceptEnums.DetailDateError.setParams(row+1, 9).getDesc());}return cusTomerType;}public static long getAmount(String cellValue, int row) throws CellDataException {if (cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row+1, 8).getDesc());}return (long) (new Double(cellValue) * 100);}}
上面四個代碼塊是主流程對excel中字段的業務處理,其中調用OSP接口(一種rpc)的時候要注意,由于是分布式的遠程調用,所以不可以使用事物的失敗回滾方案,只能手動的捕捉異常并且手動的調用失敗的補償方法。
下面注意了,是超大數據量的excel文件解析的核心代碼了:
package com.vip.vpal.mgr.controller;import java.io.IOException; import java.io.InputStream;import javax.xml.parsers.ParserConfigurationException;import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable; import org.apache.poi.xssf.eventusermodel.XSSFReader; import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler; import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler; import org.apache.poi.xssf.model.StylesTable; import org.apache.poi.xssf.usermodel.XSSFComment; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackageAccess; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.util.CellAddress; import org.apache.poi.ss.util.CellReference; import org.apache.poi.util.SAXHelper; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.web.multipart.commons.CommonsMultipartFile; import org.xml.sax.ContentHandler; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.XMLReader;import com.vip.vpal.mgr.dto.ProcessTransDetailDataDto;/*** Excle xxls 批量讀取大文件操作類* */ public class XlsxProcessAbstract {private final Logger logger = LoggerFactory.getLogger(XlsxProcessAbstract.class);//開始讀取行數從第0行開始計算private int rowIndex = -1;private final int minColumns = 0;/*** Destination for data*/private final StringBuffer rowStrs = new StringBuffer();ProcessTransDetailDataDto processTransDetailData = new ProcessTransDetailDataDto();/*** 支持遍歷同一個excle文件下多個sheet的解析* excel記錄行操作方法,以行索引和行元素列表為參數,對一行元素進行操作,元素為String類型* @param filename* @return* @throws Exception*/public ProcessTransDetailDataDto processAllSheet(String filename) throws Exception {OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);XSSFReader xssfReader = new XSSFReader(pkg);StylesTable styles = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();InputStream stream = null;while (iter.hasNext()) {try {stream = iter.next();parserSheetXml(styles, strings, new SheetToCSV(), stream);} catch (Exception e) {logger.error("parserSheetXml error: ",e);} finally {stream.close();}}return processTransDetailData;}/*** 支持遍歷同一個excle文件下多個sheet的解析* excel記錄行操作方法,以行索引和行元素列表為參數,對一行元素進行操作,元素為String類型* @param xlsxFile* @return* @throws Exception* @author nevin.zhang*/public ProcessTransDetailDataDto processAllSheet(CommonsMultipartFile xlsxFile) throws Exception {OPCPackage pkg = OPCPackage.open(xlsxFile.getInputStream());ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);XSSFReader xssfReader = new XSSFReader(pkg);StylesTable styles = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();InputStream stream = null;while (iter.hasNext()) {try {stream = iter.next();parserSheetXml(styles, strings, new SheetToCSV(), stream);} catch (Exception e) {logger.error("parserSheetXml error: ",e);} finally {stream.close();}}return processTransDetailData;}/*** 解析excel 轉換成xml* * @param styles* @param strings* @param sheetHandler* @param sheetInputStream* @throws IOException* @throws SAXException*/public void parserSheetXml(StylesTable styles, ReadOnlySharedStringsTable strings, SheetContentsHandler sheetHandler, InputStream sheetInputStream) throws IOException, SAXException {DataFormatter formatter = new DataFormatter();InputSource sheetSource = new InputSource(sheetInputStream);try {XMLReader sheetParser = SAXHelper.newXMLReader();ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);sheetParser.setContentHandler(handler);sheetParser.parse(sheetSource);} catch (ParserConfigurationException e) {throw new RuntimeException("SAX parser appears to be broken - " + e);}}/*** 讀取excel行、列值* * @author nevin.zhang*/private class SheetToCSV implements SheetContentsHandler {private boolean firstCellOfRow = false;private int currentRowNumber = -1;private int currentColNumber = -1;/*** 處理cell中為空值的情況* @param number*/private void processCellBlankCells(int number) {for (int i = 0; i < number; i++) {for (int j = 0; j < minColumns; j++) {rowStrs.append("|@|");}rowStrs.append('\n');}}@Overridepublic void startRow(int rowNum) {processCellBlankCells(rowNum - currentRowNumber - 1);firstCellOfRow = true;currentRowNumber = rowNum;currentColNumber = -1;}@Overridepublic void endRow(int rowNum) {for (int i = currentColNumber; i < minColumns; i++) {rowStrs.append("|@|");}// 從設置的rowIndex的行數開始加入到list,前三行為標題,多個sheet都從第三行開始讀取的數據加入到listString endRowStrs=rowStrs.toString();if(currentRowNumber>rowIndex && !rowStrs.toString().equals("|@|")){processTransDetailData.contentList.add(endRowStrs);}if(!rowStrs.toString().equals("|@|")) {processTransDetailData.processTransTotalData(endRowStrs, currentRowNumber);}rowStrs.delete(0, rowStrs.length());// 清空buffer}@Overridepublic void cell(String cellReference, String cellValue, XSSFComment comment) {if (firstCellOfRow) {firstCellOfRow = false;} else {rowStrs.append("|@|");}if (cellReference == null) {cellReference = new CellAddress(currentRowNumber, currentColNumber).formatAsString();}int thisCol = (new CellReference(cellReference)).getCol();int missedCols = thisCol - currentColNumber - 1;for (int i = 0; i < missedCols; i++) {// excel中為空的值設置為“|@|”rowStrs.append("|@|");}currentColNumber = thisCol;rowStrs.append(cellValue);}@Overridepublic void headerFooter(String text, boolean isHeader, String tagName) {}} }
由于導入的excel文件的格式是:第一行為匯總數據的標題,第二行為匯總數據,第三行為明細行的標題,其余行為明細數據,所以需要一個ProcessTransDetailDataDto實體類
來封裝一下。
上述代碼考慮到了原excel文件空行和空值的問題,做了處理。在開發的時候踩了一個坑:將空值的處理成“,”導致了其他數據帶逗號的時候,數據和列不匹配的問題,所以本文中就將逗號寫成了“ |@|”,盡量的避免偶然性。
下面貼上依賴的POI的jar包中的最核心源碼,供各位參考:
/* ====================================================================Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements. See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0(the "License"); you may not use this file except in compliance withthe License. You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. ==================================================================== */ package org.apache.poi.xssf.eventusermodel;import static org.apache.poi.xssf.usermodel.XSSFRelation.NS_SPREADSHEETML;import java.util.LinkedList; import java.util.Queue;import org.apache.poi.ss.usermodel.BuiltinFormats; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.util.CellAddress; import org.apache.poi.util.POILogFactory; import org.apache.poi.util.POILogger; import org.apache.poi.xssf.model.CommentsTable; import org.apache.poi.xssf.model.StylesTable; import org.apache.poi.xssf.usermodel.XSSFCellStyle; import org.apache.poi.xssf.usermodel.XSSFComment; import org.apache.poi.xssf.usermodel.XSSFRichTextString; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTComment; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler;/*** This class handles the processing of a sheet#.xml * sheet part of a XSSF .xlsx file, and generates* row and cell events for it.*/ public class XSSFSheetXMLHandler extends DefaultHandler {private static final POILogger logger = POILogFactory.getLogger(XSSFSheetXMLHandler.class);/*** These are the different kinds of cells we support.* We keep track of the current one between* the start and end.*/enum xssfDataType {BOOLEAN,ERROR,FORMULA,INLINE_STRING,SST_STRING,NUMBER,}/*** Table with the styles used for formatting*/private StylesTable stylesTable;/*** Table with cell comments*/private CommentsTable commentsTable;/*** Read only access to the shared strings table, for looking* up (most) string cell's contents*/private ReadOnlySharedStringsTable sharedStringsTable;/*** Where our text is going*/private final SheetContentsHandler output;// Set when V start element is seenprivate boolean vIsOpen;// Set when F start element is seenprivate boolean fIsOpen;// Set when an Inline String "is" is seenprivate boolean isIsOpen;// Set when a header/footer element is seenprivate boolean hfIsOpen;// Set when cell start element is seen;// used when cell close element is seen.private xssfDataType nextDataType;// Used to format numeric cell values.private short formatIndex;private String formatString;private final DataFormatter formatter;private int rowNum;private int nextRowNum; // some sheets do not have rowNums, Excel can read them so we should try to handle them correctly as wellprivate String cellRef;private boolean formulasNotResults;// Gathers characters as they are seen.private StringBuffer value = new StringBuffer();private StringBuffer formula = new StringBuffer();private StringBuffer headerFooter = new StringBuffer();private Queue<CellAddress> commentCellRefs;/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,CommentsTable comments,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,DataFormatter dataFormatter,boolean formulasNotResults) {this.stylesTable = styles;this.commentsTable = comments;this.sharedStringsTable = strings;this.output = sheetContentsHandler;this.formulasNotResults = formulasNotResults;this.nextDataType = xssfDataType.NUMBER;this.formatter = dataFormatter;init();}/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,DataFormatter dataFormatter,boolean formulasNotResults) {this(styles, null, strings, sheetContentsHandler, dataFormatter, formulasNotResults);}/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,boolean formulasNotResults) {this(styles, strings, sheetContentsHandler, new DataFormatter(), formulasNotResults);}private void init() {if (commentsTable != null) {commentCellRefs = new LinkedList<CellAddress>();for (CTComment comment : commentsTable.getCTComments().getCommentList().getCommentArray()) {commentCellRefs.add(new CellAddress(comment.getRef()));}} }private boolean isTextTag(String name) {if("v".equals(name)) {// Easy, normal v text tagreturn true;}if("inlineStr".equals(name)) {// Easy inline stringreturn true;}if("t".equals(name) && isIsOpen) {// Inline string <is><t>...</t></is> pairreturn true;}// It isn't a text tagreturn false;}@Override@SuppressWarnings("unused")public void startElement(String uri, String localName, String qName,Attributes attributes) throws SAXException {if (uri != null && ! uri.equals(NS_SPREADSHEETML)) {return;}if (isTextTag(localName)) {vIsOpen = true;// Clear contents cachevalue.setLength(0);} else if ("is".equals(localName)) {// Inline string outer tagisIsOpen = true;} else if ("f".equals(localName)) {// Clear contents cacheformula.setLength(0);// Mark us as being a formula if not alreadyif(nextDataType == xssfDataType.NUMBER) {nextDataType = xssfDataType.FORMULA;}// Decide where to get the formula string fromString type = attributes.getValue("t");if(type != null && type.equals("shared")) {// Is it the one that defines the shared, or uses it?String ref = attributes.getValue("ref");String si = attributes.getValue("si");if(ref != null) {// This one defines it// TODO Save it somewherefIsOpen = true;} else {// This one uses a shared formula// TODO Retrieve the shared formula and tweak it to // match the current cellif(formulasNotResults) {logger.log(POILogger.WARN, "shared formulas not yet supported!");} else {// It's a shared formula, so we can't get at the formula string yet// However, they don't care about the formula string, so that's ok!}}} else {fIsOpen = true;}}else if("oddHeader".equals(localName) || "evenHeader".equals(localName) ||"firstHeader".equals(localName) || "firstFooter".equals(localName) ||"oddFooter".equals(localName) || "evenFooter".equals(localName)) {hfIsOpen = true;// Clear contents cacheheaderFooter.setLength(0);}else if("row".equals(localName)) {String rowNumStr = attributes.getValue("r");if(rowNumStr != null) {rowNum = Integer.parseInt(rowNumStr) - 1;} else {rowNum = nextRowNum;}output.startRow(rowNum);}// c => cellelse if ("c".equals(localName)) {// Set up defaults.this.nextDataType = xssfDataType.NUMBER;this.formatIndex = -1;this.formatString = null;cellRef = attributes.getValue("r");String cellType = attributes.getValue("t");String cellStyleStr = attributes.getValue("s");if ("b".equals(cellType))nextDataType = xssfDataType.BOOLEAN;else if ("e".equals(cellType))nextDataType = xssfDataType.ERROR;else if ("inlineStr".equals(cellType))nextDataType = xssfDataType.INLINE_STRING;else if ("s".equals(cellType))nextDataType = xssfDataType.SST_STRING;else if ("str".equals(cellType))nextDataType = xssfDataType.FORMULA;else {// Number, but almost certainly with a special style or formatXSSFCellStyle style = null;if (stylesTable != null) {if (cellStyleStr != null) {int styleIndex = Integer.parseInt(cellStyleStr);style = stylesTable.getStyleAt(styleIndex);} else if (stylesTable.getNumCellStyles() > 0) {style = stylesTable.getStyleAt(0);}}if (style != null) {this.formatIndex = style.getDataFormat();this.formatString = style.getDataFormatString();if (this.formatString == null)this.formatString = BuiltinFormats.getBuiltinFormat(this.formatIndex);}}}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {if (uri != null && ! uri.equals(NS_SPREADSHEETML)) {return;}String thisStr = null;// v => contents of a cellif (isTextTag(localName)) {vIsOpen = false;// Process the value contents as required, now we have it allswitch (nextDataType) {case BOOLEAN:char first = value.charAt(0);thisStr = first == '0' ? "FALSE" : "TRUE";break;case ERROR:thisStr = "ERROR:" + value.toString();break;case FORMULA:if(formulasNotResults) {thisStr = formula.toString();} else {String fv = value.toString();if (this.formatString != null) {try {// Try to use the value as a formattable numberdouble d = Double.parseDouble(fv);thisStr = formatter.formatRawCellContents(d, this.formatIndex, this.formatString);} catch(NumberFormatException e) {// Formula is a String result not a Numeric onethisStr = fv;}} else {// No formating applied, just do raw value in all casesthisStr = fv;}}break;case INLINE_STRING:// TODO: Can these ever have formatting on them?XSSFRichTextString rtsi = new XSSFRichTextString(value.toString());thisStr = rtsi.toString();break;case SST_STRING:String sstIndex = value.toString();try {int idx = Integer.parseInt(sstIndex);XSSFRichTextString rtss = new XSSFRichTextString(sharedStringsTable.getEntryAt(idx));thisStr = rtss.toString();}catch (NumberFormatException ex) {logger.log(POILogger.ERROR, "Failed to parse SST index '" + sstIndex, ex);}break;case NUMBER:String n = value.toString();if (this.formatString != null && n.length() > 0)thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);elsethisStr = n;break;default:thisStr = "(TODO: Unexpected type: " + nextDataType + ")";break;}// Do we have a comment for this cell?checkForEmptyCellComments(EmptyCellCommentsCheckType.CELL);XSSFComment comment = commentsTable != null ? commentsTable.findCellComment(new CellAddress(cellRef)) : null;// Outputoutput.cell(cellRef, thisStr, comment);} else if ("f".equals(localName)) {fIsOpen = false;} else if ("is".equals(localName)) {isIsOpen = false;} else if ("row".equals(localName)) {// Handle any "missing" cells which had comments attachedcheckForEmptyCellComments(EmptyCellCommentsCheckType.END_OF_ROW);// Finish up the rowoutput.endRow(rowNum);// some sheets do not have rowNum set in the XML, Excel can read them so we should try to read them as wellnextRowNum = rowNum + 1;} else if ("sheetData".equals(localName)) {// Handle any "missing" cells which had comments attachedcheckForEmptyCellComments(EmptyCellCommentsCheckType.END_OF_SHEET_DATA);}else if("oddHeader".equals(localName) || "evenHeader".equals(localName) ||"firstHeader".equals(localName)) {hfIsOpen = false;output.headerFooter(headerFooter.toString(), true, localName);}else if("oddFooter".equals(localName) || "evenFooter".equals(localName) ||"firstFooter".equals(localName)) {hfIsOpen = false;output.headerFooter(headerFooter.toString(), false, localName);}}/*** Captures characters only if a suitable element is open.* Originally was just "v"; extended for inlineStr also.*/@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {if (vIsOpen) {value.append(ch, start, length);}if (fIsOpen) {formula.append(ch, start, length);}if (hfIsOpen) {headerFooter.append(ch, start, length);}}/*** Do a check for, and output, comments in otherwise empty cells.*/private void checkForEmptyCellComments(EmptyCellCommentsCheckType type) {if (commentCellRefs != null && !commentCellRefs.isEmpty()) {// If we've reached the end of the sheet data, output any// comments we haven't yet already handledif (type == EmptyCellCommentsCheckType.END_OF_SHEET_DATA) {while (!commentCellRefs.isEmpty()) {outputEmptyCellComment(commentCellRefs.remove());}return;}// At the end of a row, handle any comments for "missing" rows before usif (this.cellRef == null) {if (type == EmptyCellCommentsCheckType.END_OF_ROW) {while (!commentCellRefs.isEmpty()) {if (commentCellRefs.peek().getRow() == rowNum) {outputEmptyCellComment(commentCellRefs.remove());} else {return;}}return;} else {throw new IllegalStateException("Cell ref should be null only if there are only empty cells in the row; rowNum: " + rowNum);}}CellAddress nextCommentCellRef;do {CellAddress cellRef = new CellAddress(this.cellRef);CellAddress peekCellRef = commentCellRefs.peek();if (type == EmptyCellCommentsCheckType.CELL && cellRef.equals(peekCellRef)) {// remove the comment cell ref from the list if we're about to handle it alongside the cell contentcommentCellRefs.remove();return;} else {// fill in any gaps if there are empty cells with comment mixed in with non-empty cellsint comparison = peekCellRef.compareTo(cellRef);if (comparison > 0 && type == EmptyCellCommentsCheckType.END_OF_ROW && peekCellRef.getRow() <= rowNum) {nextCommentCellRef = commentCellRefs.remove();outputEmptyCellComment(nextCommentCellRef);} else if (comparison < 0 && type == EmptyCellCommentsCheckType.CELL && peekCellRef.getRow() <= rowNum) {nextCommentCellRef = commentCellRefs.remove();outputEmptyCellComment(nextCommentCellRef);} else {nextCommentCellRef = null;}}} while (nextCommentCellRef != null && !commentCellRefs.isEmpty());}}/*** Output an empty-cell comment.*/private void outputEmptyCellComment(CellAddress cellRef) {XSSFComment comment = commentsTable.findCellComment(cellRef);output.cell(cellRef.formatAsString(), null, comment);}private enum EmptyCellCommentsCheckType {CELL,END_OF_ROW,END_OF_SHEET_DATA}/*** You need to implement this to handle the results* of the sheet parsing.*/public interface SheetContentsHandler {/** A row with the (zero based) row number has started */public void startRow(int rowNum);/** A row with the (zero based) row number has ended */public void endRow(int rowNum);/** * A cell, with the given formatted value (may be null), * and possibly a comment (may be null), was encountered */public void cell(String cellReference, String formattedValue, XSSFComment comment);/** A header or footer has been encountered */public void headerFooter(String text, boolean isHeader, String tagName);} }本文在性能測試時處理100W數據的excel文件的時候,“獲取前臺導入的文件并得到分類匯總的文件數據和明細的文件數據”只需要25秒,這一點上性能已經很高了,本系統中的性能瓶頸已經完美解決了。
總結
以上是生活随笔為你收集整理的超大数据量的xlsx格式的excel文件的读取和解析,解决了POI方式的内存溢出和性能问题的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 就只想要几个c币,单独够买又不行,哎
- 下一篇: 电脑上最好用的几款azw/azw3阅读器