當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

超大数据量的xlsx格式的excel文件的读取和解析，解决了POI方式的内存溢出和性能问题

發布時間：2024/1/1 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了超大数据量的xlsx格式的excel文件的读取和解析，解决了POI方式的内存溢出和性能问题小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

在之前的博文《 POI讀取并解析xlsx格式的excel文件》中，在小數據量的情況下是可以輕松愉快的處理的，但是當excel文件的數據量達到百萬級的時候，

InputStream?is?=?files[i].getInputStream();??

XSSFWorkbook?xssFWorkbook?=?new?XSSFWorkbook(is);?

在02處直接就會內存溢出了。無論怎么搶救都沒用，看來得要換一條路走走了。

在不停的Google查資料折騰了兩天之后，在POI官網成功的找到了解決方案。此處請允許我稍微吐槽一下POI，有瓶頸問題的解決方案卻隱藏的很深，只有一個不起眼的小鏈接，難道是怕大家都知道有點丟臉？

總結一下該方案的主要原理：超大數據量的excel文件通過頁面上傳后（nginx的默認最大文件字節要調大一些：client_max_body_size ?xxxm），后臺接收到該文件的對象CommonsMultipartFile。首先獲取該文件對象的inputStream，然后用OPCPackage來打開該文件流，將excel在內存中轉化為xml格式卻不會出現內存溢出的問題，根據該xml的標簽就可以識別是格式，標題還是內容。然后在內存中通過POI框架中的XSSFSheetXMLHandler類中的sheetContentsHandler接口來根據上述的標簽來解析內容。可以將解析到的內容存入list或者map容器中便于后續的業務處理（純內容數據，幾百萬的數據量輕松勝任，親測不會內存溢出）。當然根據業務需要，需要復寫sheetContentsHandler接口的startRow，endRow，cell，headerFooter四大方法。

當然了，筆者也親測了另一種方案：就是用OPCPackage來打開該文件流并且將excel在內存中轉化為xml格式之后，一股腦兒的用緩沖流分批的將所有原excel內容數據寫入到本地的txt文件，再去分批的readLine該文件中的數據，進行業務處理。該方案的好處是交易導入的文件可以物理的落地，作為后期查驗的依據和憑證。當然了，缺點是多了一次物理轉儲的過程，耗時會延長。如何選擇看個人的業務需求咯。

本文中重點講述第一種方案吧，話不多說，貼代碼：

/*** @return* @throws Exception* @author jason.gao* 功能描述：交易導入*/@RequestMapping(value = "/transDetail/upload", method = {RequestMethod.POST, RequestMethod.GET})@RequestGuard(perm = "transaction.import.upload")public ResponseEntity<ResponseEnvelope<RestApiResp>> uploadFile(@RequestParam("file") CommonsMultipartFile[] files, HttpServletRequest req, HttpServletResponse resp) throws IOException {logger.info("uploadFile == >upload button start; fileName:[{}], CommonsMultipartFile[]:[{}]", files[0].getFileItem().getName(), files);long start = System.currentTimeMillis();String result = "完成交易文件的導入！";if (null == files || files.length != 1) {return RestApiResp.getSuccResponseEntity("必須上傳一個文件", null);}//重置buffer，在可能會超時的地方輸出resp的字節，避免前端ajax請求斷開！resp.setBufferSize(1);ServletOutputStream out = resp.getOutputStream();XlsxProcessAbstract xlsxProcess = new XlsxProcessAbstract();long getFileAndDataTime;ProcessTransDetailDataDto data;try {//獲取明細行累積的支付/退款的總金額/總筆數等數據的DTOdata = xlsxProcess.processAllSheet(files[0]);logger.info("匯總行的數據：[{}]", data.dtoToString());//獲取匯總行和明細行數據（包含標題）List<String> contentList = data.contentList;logger.info("明細行的數據條數為：[{}]", JSON.toJSONString(contentList.size() - 3));getFileAndDataTime = System.currentTimeMillis();logger.info("獲取文件并得到數據完成。耗時：[{}]秒", (getFileAndDataTime - start)/1000);//校驗匯總行數據正確性checkDetailSummary(contentList, data, out);logger.info("匯總行數據正確性的校驗已通過！");//分批調用OSP插入過程String handleResult = doOspHandle(contentList, data, out);if (!handleResult.equals(TransImportJobStatus.Success.getValue())) {result = TransImportJobStatus.getDescByKey(handleResult);logger.error(result);}} catch (CellDataException e) {logger.error("CellDataException: Error:[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (OspException e) {logger.error("OspException：[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (IOException e) {logger.error("IOException：[{}]", e);return RestApiResp.getSuccResponseEntity(e.getMessage(), null);} catch (Exception e) {logger.error("未知異常：[{}]", e);return RestApiResp.getSuccResponseEntity("未知異常，請排查日志：" + e.getMessage(), null);}long finishCheckAndInsertTime = System.currentTimeMillis();logger.info("完成數據校驗和數據分批插入。耗時：[{}]秒", (finishCheckAndInsertTime - getFileAndDataTime)/1000);logger.info("[{}]，整個后臺處理過程共耗時：[{}]秒", result, (finishCheckAndInsertTime - start)/1000);return RestApiResp.getSuccResponseEntity(result, HttpStatus.OK);}

上面代碼塊是整個后臺的主流程，注意的是要充分的捕捉異常，將異常信息呈獻給前端頁面和日志系統，便于生產故障時排查問題。

接下來的四個代碼塊是對excel中字段的業務處理，屬于業務部分，不關心業務的可以忽略這些代碼片段。

public String doOspHandle (List<String> contentList, ProcessTransDetailDataDto data, ServletOutputStream out) throws CellDataException, OspException, IOException{// 獲取當前工作薄的明細行int start = 3;int size = 1000;String importStatus = "";//分批調用OSP接口執行插入while(start < contentList.size()) {importStatus = handleTransImport(contentList, start, size, data);if (!importStatus.equals(TransImportJobStatus.Success.getValue())) {logger.error("從第[{}]到[{}]行的數據，分批調用OSP接口失敗", start + 1, start + size + 1);return importStatus;}start += size;out.write(new String(" ").getBytes());out.flush();}//最終狀態：交易全部成功if (importStatus.equals(TransImportJobStatus.Success.getValue())){logger.info("調用“交易明細導入”的OSP接口成功！");TransDetailResp confirmResp;OspTransDetailServiceHelper.OspTransDetailServiceClient ospTransDetailServiceClient = new OspTransDetailServiceHelper.OspTransDetailServiceClient();logger.info("調用“確認交易明細成功”的OSP接口的請求參數：商戶號=[{}]，結算單號=[{}]，總條數=[{}]", data.getMerchantId(), data.getSettleOrderNo(), contentList.size()-3);try{confirmResp = ospTransDetailServiceClient.transDetailConfirm(data.getMerchantId(), data.getSettleOrderNo(), contentList.size()-3);} catch (OspException e) {logger.error("調用“確認交易明細成功”的OSP接口的拋出異常！[{}]", e);throw e;} finally {out.write(new String("").getBytes());out.flush();}logger.info("調用“確認交易明細成功”的OSP接口的返回參數為：{}", JSON.toJSONString(confirmResp));if (!confirmResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailConfirm.getValue(), TransImpFileExceptEnums.OspTransDetailConfirm.getDesc());}}return importStatus;}

/*** 調用osp接口：執行交易明細的導入* 返回OSP操作完成的狀態*/public String handleTransImport(List<String> contentList, int start, int size, ProcessTransDetailDataDto data) throws CellDataException, OspException{//分批的調用osp接口：執行交易明細的導入OspTransDetailServiceHelper.OspTransDetailServiceClient ospTransDetailServiceClient = new OspTransDetailServiceHelper.OspTransDetailServiceClient();TransDetailResp transDetailResp;List<TransDetailImport> transDetailImportList = new ArrayList<>();//組織好一個list數據：讀取從start -> start+size行的數據for (int i = start; i < start + size && i < contentList.size(); i++) {TransDetailImport transDetailImport = new TransDetailImport();String[] detailRow = contentList.get(i).split("\\|@\\|");if (detailRow != null || !detailRow.equals("")) {try {transDetailImport.setMerchantId(data.getMerchantId());transDetailImport.setMerchantName(data.getMerchantName());transDetailImport.setMerchantBatchNo(data.getSettleOrderNo()); //商戶批次號transDetailImport.setMerchantBatchSerialNo(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[0], i, 0)); //商戶批次序號<來源:頁面導入模板中明細行的序號>模板必填transDetailImport.setMerchantOrderNo(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[1], i, 1)); //商戶訂單號：模板必填transDetailImport.setPlatformOrderNo(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[2], detailRow[1])); //平臺訂單號(支付退款訂單號)：如果不送默認商戶訂單號transDetailImport.setMerchantTransDate(detailRow[4].equals("") ? new Date() : new Date(detailRow[4])); //商戶交易日期：如果不送默認上送日期transDetailImport.setTransType(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[5], TransTypeEnums.Payment.getValue())); //交易類型：如果不送默認支付transDetailImport.setOriginOrderNo(XssfCellValueCheckHelper.checkAndGetOriginOrderNo(detailRow[3], transDetailImport.getTransType(), i)); //原支付訂單號transDetailImport.setCurrency(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[6], "CNY")); //幣種：三位貨幣代碼，如果不送默認CNY:人民幣transDetailImport.setAmount(XssfCellValueCheckHelper.getAmount(detailRow[7], i)); //交易金額：外部交易上傳金額，內部商戶訂單金額transDetailImport.setCustomerName(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[9], i, 9)); //客戶名稱：模板必填transDetailImport.setIdType(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[10], i, 10)); //證件類型：模板必填transDetailImport.setCustomerType(XssfCellValueCheckHelper.checkAndGetCustomerType(detailRow, i, 8)); //客戶類型：根據證件類型確定transDetailImport.setIdNo(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[11], i, 11)); //證件號碼：模板必填transDetailImport.setMoneyType(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[12], MoneyTypeEnums.Currency.getValue())); //款項類型：默認：A 預付款項transDetailImport.setIsPayUnderBonded(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[13], IsPayUnderBondedEnums.YES.getValue())); //是否保稅貨物項下付款：默認：是transDetailImport.setTradingCode(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[14], TradingCodeEnums.GoodsTrade.getValue())); //交易編碼：交易編碼默認：122030貨物貿易transDetailImport.setRmbAccount(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[15], "")); //人民幣賬號transDetailImport.setTrmo(XssfCellValueCheckHelper.getRealOrDefaultValue(detailRow[16], "一般貿易")); //交易附言：默認"一般貿易"transDetailImport.setProductDesc(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[17], i, 17)); //產品描述transDetailImport.setWaybillNum(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[18], i, 18)); //運單號transDetailImport.setProductNum(XssfCellValueCheckHelper.getStringNotEmpty(detailRow[19], i, 19).trim()); //銷售數量transDetailImport.setTransFrom("3"); //交易來源：1支付引擎、2門戶導入、3運營控制臺導入、4外部商戶導入} catch (Exception e) {logger.error("組裝數據時檢測到數據轉換異常，調用“交易明細刪除”的OSP接口，將未檢測到錯誤的已插入部分數據全部回滾掉，請求參數為：[{}],[{}]，錯誤詳情：[{}]", data.getMerchantId(), data.getSettleOrderNo(), e);TransDetailResp delResp = ospTransDetailServiceClient.transDetailDel(data.getMerchantId(), data.getSettleOrderNo());if (!delResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailDelRrror.getValue(), TransImpFileExceptEnums.OspTransDetailDelRrror.getDesc());}return TransImportJobStatus.InsetFailAndRollBack.getValue();}} else {throw new CellDataException(TransImpFileExceptEnums.EmptyHeadLine.getValue(), TransImpFileExceptEnums.EmptyHeadLine.getDesc());}transDetailImportList.add(transDetailImport);}logger.info("開始調用“導入交易明細”的OSP接口，從第[{}]行到第[{}]行的數據", start, (start + size) < contentList.size() ? (start + size) : contentList.size());try{transDetailResp = ospTransDetailServiceClient.transDetailImport(transDetailImportList);} catch (OspException e) {logger.error("調用“導入交易明細”的OSP接口的拋出異常！[{}]", e);logger.info("調用“交易明細刪除”的OSP接口的請求參數為：[{}],[{}]", data.getMerchantId(), data.getSettleOrderNo());TransDetailResp delResp = ospTransDetailServiceClient.transDetailDel(data.getMerchantId(), data.getSettleOrderNo());if (!delResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailDelRrror.getValue(), TransImpFileExceptEnums.OspTransDetailDelRrror.getDesc());}return TransImportJobStatus.InsetFailAndRollBack.getValue();}logger.info("調用“導入交易明細”的OSP接口的返回參數為：{}", JSON.toJSONString(transDetailResp));if (!transDetailResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {logger.info("調用“交易明細刪除”的OSP接口的請求參數為：[{}],[{}]", data.getMerchantId(), data.getSettleOrderNo());TransDetailResp delResp;try{delResp = ospTransDetailServiceClient.transDetailDel(data.getMerchantId(), data.getSettleOrderNo());} catch (OspException e) {logger.error("調用“交易明細刪除”的OSP接口的拋出異常！[{}]", e);throw e;}logger.info("調用“交易明細刪除”的OSP接口的返回參數為：{}", JSON.toJSONString(delResp));if (!delResp.getResponseCode().equals(MessageEnum.SUCCESS.getValue())) {throw new OspException(TransImpFileExceptEnums.OspTransDetailDelRrror.getValue(), TransImpFileExceptEnums.OspTransDetailDelRrror.getDesc());}return TransImportJobStatus.InsetFailAndRollBack.getValue();}return TransImportJobStatus.Success.getValue();}
/*** 校驗匯總行所有必填項*/public void checkHeadNotEmpty(ProcessTransDetailDataDto dataDto) throws CellDataException{if (dataDto.getMerchantId()==null || dataDto.getMerchantId().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(1).getDesc());}if (dataDto.getMerchantName()==null || dataDto.getMerchantName().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(2).getDesc());}if (dataDto.getSettleOrderNo()==null || dataDto.getSettleOrderNo().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(3).getDesc());}if (dataDto.getTotalPaymentCount() == null || dataDto.getTotalPaymentCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(4).getDesc());}if (dataDto.getTotalPaymentAmount()==null || dataDto.getTotalPaymentAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(5).getDesc());}if (dataDto.getTotalRefundCount()==null || dataDto.getTotalRefundCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(6).getDesc());}if (dataDto.getTotalRefundAmount()==null || dataDto.getTotalRefundAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(7).getDesc());}if (dataDto.getNetTotalCount()==null || dataDto.getNetTotalCount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(8).getDesc());}if (dataDto.getNetTotalAmount()==null || dataDto.getNetTotalAmount().equals("")) {throw new CellDataException(TransImpFileExceptEnums.HeadDataRrror.getValue(), TransImpFileExceptEnums.HeadDataRrror.setParams(9).getDesc());}}

package com.vip.vpal.mgr.controller;import com.vip.vpal.mgr.enums.IdTypeCustomerTypeEnums; import com.vip.vpal.mgr.enums.TransImpFileExceptEnums; import com.vip.vpal.mgr.enums.TransTypeEnums; import com.vip.vpal.mgr.exception.CellDataException; import org.springframework.beans.factory.annotation.Autowired;/*** Created by jason.gao on 2017/8/11 0011.*/ public class XssfCellValueCheckHelper {public static String getStringNotEmpty(String cellValue, int row, int col) throws CellDataException {if (cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row+1, col +1).getDesc());}return cellValue;}public static String getRealOrDefaultValue(String cellValue, String defVlaue) {if (cellValue.equals("")) {return defVlaue;}return cellValue;}/*** 校驗原支付訂單號在退款時必填* 返回原支付訂單號*/public static String checkAndGetOriginOrderNo(String cellValue, String transType, int row) throws CellDataException {if (transType.equals(TransTypeEnums.Refund.getValue()) && cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row +1, 4).getDesc());}return cellValue; //原支付訂單號}/*** 校驗證件類型和客戶類型的關系一致性* 返回客戶類型的值*/public static String checkAndGetCustomerType(String[] cellRow, int row, int cell) throws CellDataException {//校驗證件類型是否正確String idType = cellRow[10];String customerTypeByDict = IdTypeCustomerTypeEnums.getDesc(idType);if (customerTypeByDict.equals("")) {throw new CellDataException(TransImpFileExceptEnums.DetailDateError.getValue(), TransImpFileExceptEnums.DetailDateError.setParams(row + 1, 11).getDesc());}//校驗客戶類型是否與證件類型相匹配String cusTomerType = cellRow[8]; //客戶類型，可不填，默認為證件類型對應的if (cusTomerType.equals("")) {return customerTypeByDict;}if (!customerTypeByDict.equals(cusTomerType)){throw new CellDataException(TransImpFileExceptEnums.DetailDateError.getValue(), TransImpFileExceptEnums.DetailDateError.setParams(row+1, 9).getDesc());}return cusTomerType;}public static long getAmount(String cellValue, int row) throws CellDataException {if (cellValue.equals("")) {throw new CellDataException(TransImpFileExceptEnums.EmptyFieldError.getValue(), TransImpFileExceptEnums.EmptyFieldError.setParams(row+1, 8).getDesc());}return (long) (new Double(cellValue) * 100);}}

上面四個代碼塊是主流程對excel中字段的業務處理，其中調用OSP接口（一種rpc）的時候要注意，由于是分布式的遠程調用，所以不可以使用事物的失敗回滾方案，只能手動的捕捉異常并且手動的調用失敗的補償方法。

下面注意了，是超大數據量的excel文件解析的核心代碼了：

package com.vip.vpal.mgr.controller;import java.io.IOException; import java.io.InputStream;import javax.xml.parsers.ParserConfigurationException;import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable; import org.apache.poi.xssf.eventusermodel.XSSFReader; import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler; import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler; import org.apache.poi.xssf.model.StylesTable; import org.apache.poi.xssf.usermodel.XSSFComment; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackageAccess; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.util.CellAddress; import org.apache.poi.ss.util.CellReference; import org.apache.poi.util.SAXHelper; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.web.multipart.commons.CommonsMultipartFile; import org.xml.sax.ContentHandler; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.XMLReader;import com.vip.vpal.mgr.dto.ProcessTransDetailDataDto;/*** Excle xxls 批量讀取大文件操作類* */ public class XlsxProcessAbstract {private final Logger logger = LoggerFactory.getLogger(XlsxProcessAbstract.class);//開始讀取行數從第0行開始計算private int rowIndex = -1;private final int minColumns = 0;/*** Destination for data*/private final StringBuffer rowStrs = new StringBuffer();ProcessTransDetailDataDto processTransDetailData = new ProcessTransDetailDataDto();/*** 支持遍歷同一個excle文件下多個sheet的解析* excel記錄行操作方法，以行索引和行元素列表為參數，對一行元素進行操作，元素為String類型* @param filename* @return* @throws Exception*/public ProcessTransDetailDataDto processAllSheet(String filename) throws Exception {OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);XSSFReader xssfReader = new XSSFReader(pkg);StylesTable styles = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();InputStream stream = null;while (iter.hasNext()) {try {stream = iter.next();parserSheetXml(styles, strings, new SheetToCSV(), stream);} catch (Exception e) {logger.error("parserSheetXml error: ",e);} finally {stream.close();}}return processTransDetailData;}/*** 支持遍歷同一個excle文件下多個sheet的解析* excel記錄行操作方法，以行索引和行元素列表為參數，對一行元素進行操作，元素為String類型* @param xlsxFile* @return* @throws Exception* @author nevin.zhang*/public ProcessTransDetailDataDto processAllSheet(CommonsMultipartFile xlsxFile) throws Exception {OPCPackage pkg = OPCPackage.open(xlsxFile.getInputStream());ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);XSSFReader xssfReader = new XSSFReader(pkg);StylesTable styles = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();InputStream stream = null;while (iter.hasNext()) {try {stream = iter.next();parserSheetXml(styles, strings, new SheetToCSV(), stream);} catch (Exception e) {logger.error("parserSheetXml error: ",e);} finally {stream.close();}}return processTransDetailData;}/*** 解析excel 轉換成xml* * @param styles* @param strings* @param sheetHandler* @param sheetInputStream* @throws IOException* @throws SAXException*/public void parserSheetXml(StylesTable styles, ReadOnlySharedStringsTable strings, SheetContentsHandler sheetHandler, InputStream sheetInputStream) throws IOException, SAXException {DataFormatter formatter = new DataFormatter();InputSource sheetSource = new InputSource(sheetInputStream);try {XMLReader sheetParser = SAXHelper.newXMLReader();ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);sheetParser.setContentHandler(handler);sheetParser.parse(sheetSource);} catch (ParserConfigurationException e) {throw new RuntimeException("SAX parser appears to be broken - " + e);}}/*** 讀取excel行、列值* * @author nevin.zhang*/private class SheetToCSV implements SheetContentsHandler {private boolean firstCellOfRow = false;private int currentRowNumber = -1;private int currentColNumber = -1;/*** 處理cell中為空值的情況* @param number*/private void processCellBlankCells(int number) {for (int i = 0; i < number; i++) {for (int j = 0; j < minColumns; j++) {rowStrs.append("|@|");}rowStrs.append('\n');}}@Overridepublic void startRow(int rowNum) {processCellBlankCells(rowNum - currentRowNumber - 1);firstCellOfRow = true;currentRowNumber = rowNum;currentColNumber = -1;}@Overridepublic void endRow(int rowNum) {for (int i = currentColNumber; i < minColumns; i++) {rowStrs.append("|@|");}// 從設置的rowIndex的行數開始加入到list，前三行為標題，多個sheet都從第三行開始讀取的數據加入到listString endRowStrs=rowStrs.toString();if(currentRowNumber>rowIndex && !rowStrs.toString().equals("|@|")){processTransDetailData.contentList.add(endRowStrs);}if(!rowStrs.toString().equals("|@|")) {processTransDetailData.processTransTotalData(endRowStrs, currentRowNumber);}rowStrs.delete(0, rowStrs.length());// 清空buffer}@Overridepublic void cell(String cellReference, String cellValue, XSSFComment comment) {if (firstCellOfRow) {firstCellOfRow = false;} else {rowStrs.append("|@|");}if (cellReference == null) {cellReference = new CellAddress(currentRowNumber, currentColNumber).formatAsString();}int thisCol = (new CellReference(cellReference)).getCol();int missedCols = thisCol - currentColNumber - 1;for (int i = 0; i < missedCols; i++) {// excel中為空的值設置為“|@|”rowStrs.append("|@|");}currentColNumber = thisCol;rowStrs.append(cellValue);}@Overridepublic void headerFooter(String text, boolean isHeader, String tagName) {}} }

由于導入的excel文件的格式是：第一行為匯總數據的標題，第二行為匯總數據，第三行為明細行的標題，其余行為明細數據，所以需要一個ProcessTransDetailDataDto實體類
來封裝一下。

package com.vip.vpal.mgr.dto;import com.vip.vpal.mgr.enums.TransTypeEnums; import org.apache.commons.lang.StringUtils;import java.math.BigDecimal; import java.util.ArrayList; import java.util.List;/*** 交易導入明細文件值處理*/ public class ProcessTransDetailDataDto{private String merchantId; //商戶號private String merchantName; //商戶名稱private String settleOrderNo; //批次號private String totalPaymentCount; // 支付總筆數private String totalPaymentAmount;// 支付總金額private String totalRefundAmount;// 退款總金額private String totalRefundCount; // 支付退款筆數private String netTotalAmount;// 凈金額匯總private String netTotalCount;// 凈筆數匯總private int currentRowNumber;private int paymentCount = 0;private BigDecimal paymentAmount = BigDecimal.ZERO;private int refundCount = 0;private BigDecimal refundAmount = BigDecimal.ZERO;private int paymentIndex = 0;private int refundIndex = 0;private int readRowTitleIndex = 1; //讀取標題匯總行private int readDetailRowIndex = 2;//讀取交易明細行public List<String> contentList = new ArrayList<>();public void processTransTotalData(String rowStrs, int currentRowNumber) {String[] cellStrs = rowStrs.split("\\|@\\|");// 讀取第二行匯總行if (currentRowNumber == readRowTitleIndex) {this.setMerchantId(cellStrs[0]);this.setMerchantName(cellStrs[1]);this.setSettleOrderNo(cellStrs[2]);this.setTotalPaymentCount(cellStrs[3]);this.setTotalPaymentAmount(cellStrs[4]);this.setTotalRefundCount(cellStrs[5]);this.setTotalRefundAmount(cellStrs[6]);this.setNetTotalCount(cellStrs[7]);this.setNetTotalAmount(cellStrs[8]);}// 讀取交易明細行if (currentRowNumber > readDetailRowIndex) {//原支付訂單號不為空則為支付交易if (cellStrs[5].toString().equals(TransTypeEnums.Refund.getValue())) {refundIndex++;this.setRefundCount(refundIndex);// 累積退款筆數this.setRefundAmount(this.getRefundAmount().add(stringToBigDecimal(cellStrs[7])));// 累積退款金額} else if (cellStrs[5].toString().equals("") || cellStrs[5].toString().equals(TransTypeEnums.Payment.getValue())){paymentIndex++;this.setPaymentCount(paymentIndex);this.setPaymentAmount(this.getPaymentAmount().add(stringToBigDecimal(cellStrs[7])));//累加支付金額}}}private static BigDecimal stringToBigDecimal(String str) {if (StringUtils.isBlank(str)) {return BigDecimal.ZERO;}BigDecimal bd = new BigDecimal(str);return bd;}public String dtoToString(){StringBuffer sb = new StringBuffer("");sb.append("商戶號="+ this.merchantId);sb.append(", 商戶名稱="+ this.merchantName);sb.append(", 批次號="+ this.settleOrderNo);sb.append(", 支付總筆數="+ this.totalPaymentCount);sb.append(", 支付總金額="+ this.totalPaymentAmount);sb.append(", 退款總金額="+ this.totalRefundAmount);sb.append(", 支付退款筆數="+ this.totalRefundCount);sb.append(", 凈金額匯總="+ this.netTotalAmount);sb.append(", 凈筆數匯總="+ this.netTotalCount);return sb.toString();}public String getMerchantId() {return merchantId;}public void setMerchantId(String merchantId) {this.merchantId = merchantId;}public String getMerchantName() {return merchantName;}public void setMerchantName(String merchantName) {this.merchantName = merchantName;}public String getSettleOrderNo() {return settleOrderNo;}public void setSettleOrderNo(String settleOrderNo) {this.settleOrderNo = settleOrderNo;}public String getTotalPaymentCount() {return totalPaymentCount;}public void setTotalPaymentCount(String totalPaymentCount) {this.totalPaymentCount = totalPaymentCount;}public String getTotalPaymentAmount() {return totalPaymentAmount;}public void setTotalPaymentAmount(String totalPaymentAmount) {this.totalPaymentAmount = totalPaymentAmount;}public String getTotalRefundAmount() {return totalRefundAmount;}public void setTotalRefundAmount(String totalRefundAmount) {this.totalRefundAmount = totalRefundAmount;}public String getTotalRefundCount() {return totalRefundCount;}public void setTotalRefundCount(String totalRefundCount) {this.totalRefundCount = totalRefundCount;}public String getNetTotalAmount() {return netTotalAmount;}public void setNetTotalAmount(String netTotalAmount) {this.netTotalAmount = netTotalAmount;}public String getNetTotalCount() {return netTotalCount;}public void setNetTotalCount(String netTotalCount) {this.netTotalCount = netTotalCount;}public int getCurrentRowNumber() {return currentRowNumber;}public void setCurrentRowNumber(int currentRowNumber) {this.currentRowNumber = currentRowNumber;}public int getPaymentCount() {return paymentCount;}public void setPaymentCount(int paymentCount) {this.paymentCount = paymentCount;}public BigDecimal getPaymentAmount() {return paymentAmount;}public void setPaymentAmount(BigDecimal paymentAmount) {this.paymentAmount = paymentAmount;}public int getRefundCount() {return refundCount;}public void setRefundCount(int refundCount) {this.refundCount = refundCount;}public BigDecimal getRefundAmount() {return refundAmount;}public void setRefundAmount(BigDecimal refundAmount) {this.refundAmount = refundAmount;}public List<String> getContentList() {return contentList;}public void setContentList(List<String> contentList) {this.contentList = contentList;}}
上述代碼考慮到了原excel文件空行和空值的問題，做了處理。在開發的時候踩了一個坑：將空值的處理成“，”導致了其他數據帶逗號的時候，數據和列不匹配的問題，所以本文中就將逗號寫成了“ |@|”，盡量的避免偶然性。

下面貼上依賴的POI的jar包中的最核心源碼，供各位參考：

/* ====================================================================Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements. See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0(the "License"); you may not use this file except in compliance withthe License. You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. ==================================================================== */ package org.apache.poi.xssf.eventusermodel;import static org.apache.poi.xssf.usermodel.XSSFRelation.NS_SPREADSHEETML;import java.util.LinkedList; import java.util.Queue;import org.apache.poi.ss.usermodel.BuiltinFormats; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.util.CellAddress; import org.apache.poi.util.POILogFactory; import org.apache.poi.util.POILogger; import org.apache.poi.xssf.model.CommentsTable; import org.apache.poi.xssf.model.StylesTable; import org.apache.poi.xssf.usermodel.XSSFCellStyle; import org.apache.poi.xssf.usermodel.XSSFComment; import org.apache.poi.xssf.usermodel.XSSFRichTextString; import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTComment; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler;/*** This class handles the processing of a sheet#.xml * sheet part of a XSSF .xlsx file, and generates* row and cell events for it.*/ public class XSSFSheetXMLHandler extends DefaultHandler {private static final POILogger logger = POILogFactory.getLogger(XSSFSheetXMLHandler.class);/*** These are the different kinds of cells we support.* We keep track of the current one between* the start and end.*/enum xssfDataType {BOOLEAN,ERROR,FORMULA,INLINE_STRING,SST_STRING,NUMBER,}/*** Table with the styles used for formatting*/private StylesTable stylesTable;/*** Table with cell comments*/private CommentsTable commentsTable;/*** Read only access to the shared strings table, for looking* up (most) string cell's contents*/private ReadOnlySharedStringsTable sharedStringsTable;/*** Where our text is going*/private final SheetContentsHandler output;// Set when V start element is seenprivate boolean vIsOpen;// Set when F start element is seenprivate boolean fIsOpen;// Set when an Inline String "is" is seenprivate boolean isIsOpen;// Set when a header/footer element is seenprivate boolean hfIsOpen;// Set when cell start element is seen;// used when cell close element is seen.private xssfDataType nextDataType;// Used to format numeric cell values.private short formatIndex;private String formatString;private final DataFormatter formatter;private int rowNum;private int nextRowNum; // some sheets do not have rowNums, Excel can read them so we should try to handle them correctly as wellprivate String cellRef;private boolean formulasNotResults;// Gathers characters as they are seen.private StringBuffer value = new StringBuffer();private StringBuffer formula = new StringBuffer();private StringBuffer headerFooter = new StringBuffer();private Queue<CellAddress> commentCellRefs;/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,CommentsTable comments,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,DataFormatter dataFormatter,boolean formulasNotResults) {this.stylesTable = styles;this.commentsTable = comments;this.sharedStringsTable = strings;this.output = sheetContentsHandler;this.formulasNotResults = formulasNotResults;this.nextDataType = xssfDataType.NUMBER;this.formatter = dataFormatter;init();}/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,DataFormatter dataFormatter,boolean formulasNotResults) {this(styles, null, strings, sheetContentsHandler, dataFormatter, formulasNotResults);}/*** Accepts objects needed while parsing.** @param styles Table of styles* @param strings Table of shared strings*/public XSSFSheetXMLHandler(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetContentsHandler,boolean formulasNotResults) {this(styles, strings, sheetContentsHandler, new DataFormatter(), formulasNotResults);}private void init() {if (commentsTable != null) {commentCellRefs = new LinkedList<CellAddress>();for (CTComment comment : commentsTable.getCTComments().getCommentList().getCommentArray()) {commentCellRefs.add(new CellAddress(comment.getRef()));}} }private boolean isTextTag(String name) {if("v".equals(name)) {// Easy, normal v text tagreturn true;}if("inlineStr".equals(name)) {// Easy inline stringreturn true;}if("t".equals(name) && isIsOpen) {// Inline string <is><t>...</t></is> pairreturn true;}// It isn't a text tagreturn false;}@Override@SuppressWarnings("unused")public void startElement(String uri, String localName, String qName,Attributes attributes) throws SAXException {if (uri != null && ! uri.equals(NS_SPREADSHEETML)) {return;}if (isTextTag(localName)) {vIsOpen = true;// Clear contents cachevalue.setLength(0);} else if ("is".equals(localName)) {// Inline string outer tagisIsOpen = true;} else if ("f".equals(localName)) {// Clear contents cacheformula.setLength(0);// Mark us as being a formula if not alreadyif(nextDataType == xssfDataType.NUMBER) {nextDataType = xssfDataType.FORMULA;}// Decide where to get the formula string fromString type = attributes.getValue("t");if(type != null && type.equals("shared")) {// Is it the one that defines the shared, or uses it?String ref = attributes.getValue("ref");String si = attributes.getValue("si");if(ref != null) {// This one defines it// TODO Save it somewherefIsOpen = true;} else {// This one uses a shared formula// TODO Retrieve the shared formula and tweak it to // match the current cellif(formulasNotResults) {logger.log(POILogger.WARN, "shared formulas not yet supported!");} else {// It's a shared formula, so we can't get at the formula string yet// However, they don't care about the formula string, so that's ok!}}} else {fIsOpen = true;}}else if("oddHeader".equals(localName) || "evenHeader".equals(localName) ||"firstHeader".equals(localName) || "firstFooter".equals(localName) ||"oddFooter".equals(localName) || "evenFooter".equals(localName)) {hfIsOpen = true;// Clear contents cacheheaderFooter.setLength(0);}else if("row".equals(localName)) {String rowNumStr = attributes.getValue("r");if(rowNumStr != null) {rowNum = Integer.parseInt(rowNumStr) - 1;} else {rowNum = nextRowNum;}output.startRow(rowNum);}// c => cellelse if ("c".equals(localName)) {// Set up defaults.this.nextDataType = xssfDataType.NUMBER;this.formatIndex = -1;this.formatString = null;cellRef = attributes.getValue("r");String cellType = attributes.getValue("t");String cellStyleStr = attributes.getValue("s");if ("b".equals(cellType))nextDataType = xssfDataType.BOOLEAN;else if ("e".equals(cellType))nextDataType = xssfDataType.ERROR;else if ("inlineStr".equals(cellType))nextDataType = xssfDataType.INLINE_STRING;else if ("s".equals(cellType))nextDataType = xssfDataType.SST_STRING;else if ("str".equals(cellType))nextDataType = xssfDataType.FORMULA;else {// Number, but almost certainly with a special style or formatXSSFCellStyle style = null;if (stylesTable != null) {if (cellStyleStr != null) {int styleIndex = Integer.parseInt(cellStyleStr);style = stylesTable.getStyleAt(styleIndex);} else if (stylesTable.getNumCellStyles() > 0) {style = stylesTable.getStyleAt(0);}}if (style != null) {this.formatIndex = style.getDataFormat();this.formatString = style.getDataFormatString();if (this.formatString == null)this.formatString = BuiltinFormats.getBuiltinFormat(this.formatIndex);}}}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {if (uri != null && ! uri.equals(NS_SPREADSHEETML)) {return;}String thisStr = null;// v => contents of a cellif (isTextTag(localName)) {vIsOpen = false;// Process the value contents as required, now we have it allswitch (nextDataType) {case BOOLEAN:char first = value.charAt(0);thisStr = first == '0' ? "FALSE" : "TRUE";break;case ERROR:thisStr = "ERROR:" + value.toString();break;case FORMULA:if(formulasNotResults) {thisStr = formula.toString();} else {String fv = value.toString();if (this.formatString != null) {try {// Try to use the value as a formattable numberdouble d = Double.parseDouble(fv);thisStr = formatter.formatRawCellContents(d, this.formatIndex, this.formatString);} catch(NumberFormatException e) {// Formula is a String result not a Numeric onethisStr = fv;}} else {// No formating applied, just do raw value in all casesthisStr = fv;}}break;case INLINE_STRING:// TODO: Can these ever have formatting on them?XSSFRichTextString rtsi = new XSSFRichTextString(value.toString());thisStr = rtsi.toString();break;case SST_STRING:String sstIndex = value.toString();try {int idx = Integer.parseInt(sstIndex);XSSFRichTextString rtss = new XSSFRichTextString(sharedStringsTable.getEntryAt(idx));thisStr = rtss.toString();}catch (NumberFormatException ex) {logger.log(POILogger.ERROR, "Failed to parse SST index '" + sstIndex, ex);}break;case NUMBER:String n = value.toString();if (this.formatString != null && n.length() > 0)thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);elsethisStr = n;break;default:thisStr = "(TODO: Unexpected type: " + nextDataType + ")";break;}// Do we have a comment for this cell?checkForEmptyCellComments(EmptyCellCommentsCheckType.CELL);XSSFComment comment = commentsTable != null ? commentsTable.findCellComment(new CellAddress(cellRef)) : null;// Outputoutput.cell(cellRef, thisStr, comment);} else if ("f".equals(localName)) {fIsOpen = false;} else if ("is".equals(localName)) {isIsOpen = false;} else if ("row".equals(localName)) {// Handle any "missing" cells which had comments attachedcheckForEmptyCellComments(EmptyCellCommentsCheckType.END_OF_ROW);// Finish up the rowoutput.endRow(rowNum);// some sheets do not have rowNum set in the XML, Excel can read them so we should try to read them as wellnextRowNum = rowNum + 1;} else if ("sheetData".equals(localName)) {// Handle any "missing" cells which had comments attachedcheckForEmptyCellComments(EmptyCellCommentsCheckType.END_OF_SHEET_DATA);}else if("oddHeader".equals(localName) || "evenHeader".equals(localName) ||"firstHeader".equals(localName)) {hfIsOpen = false;output.headerFooter(headerFooter.toString(), true, localName);}else if("oddFooter".equals(localName) || "evenFooter".equals(localName) ||"firstFooter".equals(localName)) {hfIsOpen = false;output.headerFooter(headerFooter.toString(), false, localName);}}/*** Captures characters only if a suitable element is open.* Originally was just "v"; extended for inlineStr also.*/@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {if (vIsOpen) {value.append(ch, start, length);}if (fIsOpen) {formula.append(ch, start, length);}if (hfIsOpen) {headerFooter.append(ch, start, length);}}/*** Do a check for, and output, comments in otherwise empty cells.*/private void checkForEmptyCellComments(EmptyCellCommentsCheckType type) {if (commentCellRefs != null && !commentCellRefs.isEmpty()) {// If we've reached the end of the sheet data, output any// comments we haven't yet already handledif (type == EmptyCellCommentsCheckType.END_OF_SHEET_DATA) {while (!commentCellRefs.isEmpty()) {outputEmptyCellComment(commentCellRefs.remove());}return;}// At the end of a row, handle any comments for "missing" rows before usif (this.cellRef == null) {if (type == EmptyCellCommentsCheckType.END_OF_ROW) {while (!commentCellRefs.isEmpty()) {if (commentCellRefs.peek().getRow() == rowNum) {outputEmptyCellComment(commentCellRefs.remove());} else {return;}}return;} else {throw new IllegalStateException("Cell ref should be null only if there are only empty cells in the row; rowNum: " + rowNum);}}CellAddress nextCommentCellRef;do {CellAddress cellRef = new CellAddress(this.cellRef);CellAddress peekCellRef = commentCellRefs.peek();if (type == EmptyCellCommentsCheckType.CELL && cellRef.equals(peekCellRef)) {// remove the comment cell ref from the list if we're about to handle it alongside the cell contentcommentCellRefs.remove();return;} else {// fill in any gaps if there are empty cells with comment mixed in with non-empty cellsint comparison = peekCellRef.compareTo(cellRef);if (comparison > 0 && type == EmptyCellCommentsCheckType.END_OF_ROW && peekCellRef.getRow() <= rowNum) {nextCommentCellRef = commentCellRefs.remove();outputEmptyCellComment(nextCommentCellRef);} else if (comparison < 0 && type == EmptyCellCommentsCheckType.CELL && peekCellRef.getRow() <= rowNum) {nextCommentCellRef = commentCellRefs.remove();outputEmptyCellComment(nextCommentCellRef);} else {nextCommentCellRef = null;}}} while (nextCommentCellRef != null && !commentCellRefs.isEmpty());}}/*** Output an empty-cell comment.*/private void outputEmptyCellComment(CellAddress cellRef) {XSSFComment comment = commentsTable.findCellComment(cellRef);output.cell(cellRef.formatAsString(), null, comment);}private enum EmptyCellCommentsCheckType {CELL,END_OF_ROW,END_OF_SHEET_DATA}/*** You need to implement this to handle the results* of the sheet parsing.*/public interface SheetContentsHandler {/** A row with the (zero based) row number has started */public void startRow(int rowNum);/** A row with the (zero based) row number has ended */public void endRow(int rowNum);/** * A cell, with the given formatted value (may be null), * and possibly a comment (may be null), was encountered */public void cell(String cellReference, String formattedValue, XSSFComment comment);/** A header or footer has been encountered */public void headerFooter(String text, boolean isHeader, String tagName);} }

本文在性能測試時處理100W數據的excel文件的時候，“獲取前臺導入的文件并得到分類匯總的文件數據和明細的文件數據”只需要25秒，這一點上性能已經很高了，本系統中的性能瓶頸已經完美解決了。

總結

以上是生活随笔為你收集整理的超大数据量的xlsx格式的excel文件的读取和解析，解决了POI方式的内存溢出和性能问题的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：就只想要几个c币，单独够买又不行，哎
下一篇：电脑上最好用的几款azw/azw3阅读器