生活随笔
收集整理的這篇文章主要介紹了
文档扫描识别——基于M-LSD线段检测的拍照文档校正
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
前言
1.拍照文檔掃描識別是辦公類App里面最常用到的的一類應用,市面上有很多相關的App,及主要技術點有幾個要用到圖像處理,有邊緣檢測校正,文檔濾鏡,和OCR。
2.關于邊緣文檔連續檢測,有用傳統算法,也有用深度學習的,是于傳統算法,場景泛化能力并不是很好,深度學習方向的,我之前用HED寫過一個身份證邊緣檢測項目,但要運行到移動端速度并不理想。
3.Mobile LSD(M-LSD一種用于資源受限環境的實時和輕量級的線段檢測網絡。Mobile LSD在 Android 和 iPhone 移動設備上的運行速度分別為 56.8 FPS 和 48.6 FPS。作者還稱這是首個在移動設備上可用的實時深度 LSD 方法。官方論文:https://arxiv.org/abs/2106.00186 ,開源代碼:https://github.com/navervision/mlsd 。
4.官方給的測試效果:
C++實現的效果:
5.我的開發環境是win10, vs2019, opencv4.5, ncnn,如果要啟用GPU加速,所以用到VulkanSDK,實現語言是C++。
基于M-LSD的直線檢測
1.官方有給出幾種尺寸的模型,這里要把官方的模型轉成ncnn的模型,模型轉換的步驟可以看ncnn官方文檔。
2.模型轉換之后,用模型來做線段檢測,推理代碼:
int MLSD::detectLines(const cv
::Mat
& cv_src
,std
::vector
<Line
> &lines
, int topk
, float score_threshold
, float dist_threshold
)
{w_scale
= cv_src
.cols
/ float(target_size
);h_scale
= cv_src
.rows
/ float(target_size
);int out_size
= target_size
/ 2;ncnn
::Extractor ex
= lsdnet
.create_extractor();ncnn
::Mat ncnn_in
= ncnn
::Mat::from_pixels_resize(cv_src
.data
,ncnn
::Mat
::PIXEL_RGB2BGRA
, cv_src
.cols
, cv_src
.rows
, target_size
, target_size
);ncnn_in
.substract_mean_normalize(0, norm_vals
);ex
.input("input", ncnn_in
);ncnn
::Mat org_disp_map
, max_map
, center_map
;ex
.extract("out1", org_disp_map
);ex
.extract("Decoder/Sigmoid_4:0", center_map
);ex
.extract("out2", max_map
);float* max_map_data
= (float*)max_map
.data
;float* center_map_data
= (float*)center_map
.data
;std
::vector
<std
::pair
<float, int>> sort_result(max_map
.total());for (int i
= 0; i
< max_map
.total(); i
++){if (max_map_data
[i
] == center_map_data
[i
]){sort_result
[i
] = std
::pair
<float, int>(max_map_data
[i
],i
);}}std
::partial_sort(sort_result
.begin(), sort_result
.begin() + topk
, sort_result
.end(), std
::greater
<std
::pair
<float, int> >());std
::vector
<std
::pair
<int, int>>topk_pts
;for (int i
= 0; i
< topk
; i
++){int x
= sort_result
[i
].second
% out_size
;int y
= sort_result
[i
].second
/ out_size
;topk_pts
.push_back(std
::pair
<int, int>(x
, y
));}ncnn
::Mat start_map
= org_disp_map
.channel_range(0, 2).clone();ncnn
::Mat end_map
= org_disp_map
.channel_range(2, 2).clone();ncnn
::Mat dist_map
= ncnn
::Mat(out_size
, out_size
, 1);float* start_map_data
= (float*)start_map
.data
;float* end_map_data
= (float*)end_map
.data
;for (int i
= 0; i
< start_map
.total(); i
++){start_map_data
[i
] = (start_map_data
[i
] - end_map_data
[i
]) * (start_map_data
[i
] - end_map_data
[i
]);}float* dist_map_data
= (float*)dist_map
.data
;for (int i
= 0; i
< start_map
.total()/2; i
++){dist_map_data
[i
] = std
::sqrt(start_map_data
[i
] + start_map_data
[i
+ start_map
.channel(0).total()]);}for (int i
= 0; i
< topk_pts
.size(); ++i
){int x
= topk_pts
[i
].first
;int y
= topk_pts
[i
].second
;float distance
= dist_map_data
[y
* out_size
+ x
];if (sort_result
[i
].first
> score_threshold
&& distance
> dist_threshold
){int disp_x_start
= org_disp_map
.channel(0)[y
* out_size
+ x
];int disp_y_start
= org_disp_map
.channel(1)[y
* out_size
+ x
];int disp_x_end
= org_disp_map
.channel(2)[y
* out_size
+ x
];int disp_y_end
= org_disp_map
.channel(3)[y
* out_size
+ x
];int x_start
= std
::max(std
::min((int)((x
+ disp_x_start
) * 2), target_size
), 0);int y_start
= std
::max(std
::min((int)((y
+ disp_y_start
) * 2), target_size
), 0);int x_end
= std
::max(std
::min((int)((x
+ disp_x_end
) * 2), target_size
), 0);int y_end
= std
::max(std
::min((int)((y
+ disp_y_end
) * 2), target_size
), 0);lines
.push_back(Line
{ cv
::Point(x_start
*w_scale
, y_start
*h_scale
), cv
::Point(x_end
*w_scale
, y_end
*h_scale
)});}}return 0;
}
運行效果:
擬合邊緣
得到物體的邊緣線段之后,對線段進行排序擬合,找出文檔的四個校正點
代碼:
int MLSD::detectEdge(const cv
::Mat
&cv_src
, std
::vector
<cv
::Point
>& out_points
)
{std
::vector
<Line
> lines
;detectLines(cv_src
, lines
);std
::vector
<Line
> h_lines
, v_lines
;for (auto v
: lines
){double delta_x
= v
._p1
.x
- v
._p2
.x
, delta_y
= v
._p1
.y
- v
._p2
.y
;if (fabs(delta_x
) > fabs(delta_y
)){h_lines
.push_back(v
);}else{v_lines
.push_back(v
);}}if (h_lines
.size() >= 2 && v_lines
.size() >= 2){std
::sort(h_lines
.begin(), h_lines
.end(), cmpLineY
);std
::sort(v_lines
.begin(), v_lines
.end(), cmpLineX
);out_points
.push_back(computeIntersect(h_lines
[0], v_lines
[0]));out_points
.push_back(computeIntersect(h_lines
[0], v_lines
[v_lines
.size() - 1]));out_points
.push_back(computeIntersect(h_lines
[h_lines
.size() - 1], v_lines
[0]));out_points
.push_back(computeIntersect(h_lines
[h_lines
.size() - 1], v_lines
[v_lines
.size() - 1]));}else{out_points
.push_back(cv
::Point2f(2, 2));out_points
.push_back(cv
::Point2f(2, cv_src
.rows
- 2));out_points
.push_back(cv
::Point2f(cv_src
.cols
- 2, 2));out_points
.push_back(cv
::Point2f(cv_src
.cols
- 2, cv_src
.rows
- 2));}for (int i
= 0; i
< out_points
.size(); i
++){out_points
.at(i
).x
= out_points
.at(i
).x
* w_scale
;out_points
.at(i
).y
= out_points
.at(i
).y
* h_scale
;out_points
.at(i
).x
= out_points
.at(i
).x
< 0 ? 0 : out_points
.at(i
).x
;out_points
.at(i
).y
= out_points
.at(i
).y
< 0 ? 0 : out_points
.at(i
).y
;out_points
.at(i
).x
= out_points
.at(i
).x
> cv_src
.cols
? cv_src
.cols
: out_points
.at(i
).x
;out_points
.at(i
).y
= out_points
.at(i
).y
> cv_src
.rows
? cv_src
.rows
: out_points
.at(i
).y
;}return 0;
}
運行效果:
邊緣校正
按邊緣角點對文檔做畸變校正。
int MLSD::reviseImage(const cv
::Mat
& cv_src
, cv
::Mat
& cv_dst
, std
::vector
<cv
::Point
>& in_points
)
{cv
::Point point_f
, point_b
;point_f
.x
= (in_points
.at(0).x
< in_points
.at(2).x
) ? in_points
.at(0).x
: in_points
.at(2).x
;point_f
.y
= (in_points
.at(0).y
< in_points
.at(1).y
) ? in_points
.at(0).y
: in_points
.at(1).y
;point_b
.x
= (in_points
.at(3).x
> in_points
.at(1).x
) ? in_points
.at(3).x
: in_points
.at(1).x
;point_b
.y
= (in_points
.at(3).y
> in_points
.at(2).y
) ? in_points
.at(3).y
: in_points
.at(2).y
;cv
::Rect
rect(point_f
, point_b
);cv_dst
= cv
::Mat::zeros(rect
.height
, rect
.width
, CV_8UC3
);std
::vector
<cv
::Point2f
> dst_pts
;dst_pts
.push_back(cv
::Point2f(0, 0));dst_pts
.push_back(cv
::Point2f(rect
.width
- 1, 0));dst_pts
.push_back(cv
::Point2f(0, rect
.height
- 1));dst_pts
.push_back(cv
::Point2f(rect
.width
- 1, rect
.height
- 1));std
::vector
<cv
::Point2f
> tr_points
;tr_points
.push_back(in_points
.at(0));tr_points
.push_back(in_points
.at(1));tr_points
.push_back(in_points
.at(2));tr_points
.push_back(in_points
.at(3));cv
::Mat transmtx
= getPerspectiveTransform(tr_points
, dst_pts
);warpPerspective(cv_src
, cv_dst
, transmtx
, cv_dst
.size());return 0;
}
資源
帶界面的可執行文檔,源碼、模型、依賴庫都上傳到了CSDN:https://download.csdn.net/download/matt45m/75782750
總結
以上是生活随笔為你收集整理的文档扫描识别——基于M-LSD线段检测的拍照文档校正的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。