用Java代码在ElasticSearch中索引PDF文件?
生活随笔
收集整理的這篇文章主要介紹了
用Java代码在ElasticSearch中索引PDF文件?
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
以下是我的代碼:
InputStream inputStream = new FileInputStream(new File("mypdf.pdf"));try {byte[] fileByteStream = IOUtils.toByteArray(inputStream );String base64String = new String(Base64.getEncoder().encodeToString(fileByteStream).getBytes(),"UTF-8");String strEncoded = Base64.getEncoder().encodeToString( base64String.getBytes( "utf-8" ));this.stream.close();JSONObject correspondenceNode = new JSONObject(); correspondenceNode.put("data",strEncoded );String strSsonValues = correspondenceNode.toString();HttpEntity entity = new NStringEntity(strSsonValues , ContentType.APPLICATION_JSON);elasticrestClient.put("/2018/documents/"1, entity);} catch (IOException e) {e.printStackTrace();}以下是解碼代碼:
String responseBody = elasticrestClient.get("/2018/documents/1"); //some code to fetch the hits JSONObject h = hitsArray.getJSONObject(0); source = h.getJSONObject("_source"); String object = (source.getString("data")); byte[] decodedStr = Base64.getDecoder().decode( object );FileOutputStream fos = new FileOutputStream("download.pdf"); fos.write(Base64.getDecoder().decode(new String( decodedStr, "utf-8" ))); fos.close();用戶回答回答于?2018-08-02
提取文本和元數據,并將該URL指向二進制文件本身。
{"content": "Extracted text here","meta": {// Meta data there},"url": "file://path/to/file" }總結
以上是生活随笔為你收集整理的用Java代码在ElasticSearch中索引PDF文件?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java获取注解信息_java 自定义注
- 下一篇: python编写程序模拟硬币的投掷、假设