Tech blog
  • HOME
  • Blog
  • [RPA] Convert PDF invoices to text using PAD. Avoid the pitfalls of using JavaScript and regular expressions.

[RPA] Convert PDF invoices to text using PAD. Avoid the pitfalls of using JavaScript and regular expressions.

Published: 2022.10.18 Last updated: 2025.03.04

year

That minion
Profile

このThe article states:PADInvoice PDFConvert to textMethod andJavaScriptTrial RunTimeAddictedPointsLet me introduce it to you!


The invoice ismonthly,Monthly processingbecause there isIt's tough, isn't it?OwnIn conversationTo put it simply,AWSFrom the invoice,Each account (each business division)How much eachDid you use it?I have to tally it upshould notHowever,そのThe invoice PDF isDozens of pages per fileDozens of filesbecause there isUntil it's automated,QuiteIt was hard.

For example,AWSInvoice PDFPADConvert it into text,moreoverRegular expressionsUse it,It was a littleProcessingIf you do,Like thisThe tsv file looks like thisIn one shotYou can output it.

An AWS account Usage fees consumption tax
hogehoge-AmiVoice00 (012345678900) $1,234,567.89 $123,456.78
hogehoge-AmiVoice01 (012345678901) $123,456.78 $12,345.67
... ... ...
hogehoge-AmiVoice99 (012345678999) $0.12 $0.01

The regular expression isInvoiceBy typecreateNeedThere is,Once you make it,MonthlyIt's a hassleIt's gone!

table of contents

Introduction
1. Explanation of the overall flow
2. Explanation of each action and tricky points
Final thoughts

Introduction

About PAD (Power Automate Desktop)

What is PAD?Power Automate DesktopIn abbreviation,RPA (Robotic Process Automation)ToolsThere is one.MicrosoftFrom June 2021Free of chargeIt offers.PADIf you use it,ProgrammingThe SunI don't have oneEven if youNo-code/low-code developmentAutomation appssimplyYou can make it. PAD isDownload HereAvailable

What you will be able to do with practice

・PDFText(※ In this articleHandwritten PDFsIt's unexpected)

・JavaScript on PADExecution(The best partIt's published)

Daily workAutomation (otherimportantTo workI can devote myself toTimewill increase)

List of things you need to put into practice

・PADCan be executedEnvironment(At the time of writingVersion: 2.23.114.22217)

・PDF you want to convert to text

1. Explanation of the overall flow

The completed overall flow looks like this

At firstA rough outline from the start of processing to the end of processingThe flowLet me explain.

Overall flow of converting PDF invoices to text

Action ①: In the folderfileObtain
⇒In the specified folderA PDFI will search.

Action 2: For each (loop)
⇒For each PDFWithin the loopActionI repeat.

Action 3: From PDFTextExtraction
⇒PDFConvert it to text.

Action 4: TextReplace
⇒JavaScriptTrial RunWheninconvenientCharactersReplace.

Action ⑤: Regular ExpressionEscape Text
⇒JavaScriptTrial RunWheninconvenientSpecial charactersReplace.

Action ⑥: JavaScriptExecution
⇒JavaScriptRun.(HereRegular expressions(Will be described)

Action 7: Textto the fileWrite
⇒JavaScriptExecution resultsto the fileyou save.

Action 8: Filemobile
⇒Text PDFIn the specified folderMove

Action 9: End (Loop)
⇒PDFOtherIf there is, go to ②i will be back.If notLoop end.

simplyTo summarize,In the specified folderAll PDFs availableText(JavaScriptExecution resultsSave) andProcessingThe finished PDFanotherIn the folderMove..

2. Explanation of each action and tricky points

Action ①: Get files in a folder

First of all,The PDF you want to convert to textListLet's get it.
■Action "Folder"⇒ "Get files in a folder"

Folder: Get files in a folder

"Folder":
Enter the path (save location) of the PDF you want to convert to text.Specify.Examples include:AWS billAWS Inc.It is specified.

"File Filter":
このIn the column *.pdf andWhen you enterThe extension isIt is a .pdfAll ofFilePath listYou can get

"Flow variable Files":
このto a variableBy searchHitFilePath listenter.

with thisThe path list of the PDF you want to convert to textYou can get

Action 2: For each (loop)

next,The PDF you want to convert to text isThere were severalIn caseIn preparation,For each PDFProcessingTo repeatLoopLet's set it up.
■Action "Loop"⇒『For each』

Loop: For each

"Iterationconductvalue":
PDF Path listSpecify.このIn the columnThe flow variables %Files% andEnter.PADActionInsideVariablesSpecifyTime,Both ends % insandwichedVariable namesEnterNeed

"Flow variable CurrentItem":
このto a variableThe path of one PDF isenter.ProcessingEach time it is repeated,In the following PDF pathThey will be replaced.

with thisFor each PDFProcessingconductReadyI did it.

Action 3: Extract text from PDF

Then.Back to the main topicCome in,PDFLet's turn it into text.
■Action 『PDF』⇒ "Extract text from PDF"

PDF: Extract text from PDF

"PDF file":
PDF PathSpecify.このIn the columnThe flow variables %CurrentItem% andEnter.

"Flow variable ExtractedPDFText":
このto a variableFrom PDFExtractedText informationenter.Actually,This is all you need to create a PDFText conversionCompleted.

Simply download the invoice PDFTextI just want toIn the case of,[Action 7: Textto the fileWrite it downToIt can be moved.later,Set the flow variable %ExtractedPDFText% to"WriteText"Enter,Save locationJust decide.

After texting,CSV format andTSVIn formatI want to fix itIn the case ofscript(This time with JavaScript)Regular expressionsWhen usedConvenient.

Action 4: Replace text

PAD with JavaScript etc.ScriptTrial Runin some cases,Addictive points A fewIt exists.そのPitfallsAvoidfor,It was a littleIngenuityIs required.In this action,JavaScriptWhen executingInconvenient charactersReplace.In this example,To the flow variable %ExtractedPDFText%included " ofDeleting.
■Action "Text"⇒ 『Replace text』

Text: Replace text

"Analyzetext":
Text informationContainsVariablesSpecify.このIn the columnThe flow variables %ExtractedPDFText% andEnter.

"search fortext":
PlaceCan be replacedTextSpecify.このIn the column " andEnter.(One double quote.)In text information " and 'BothincludeIn either caseOne sideIf you don't delete itIn the scriptそのIf the variableI can't give it to you.Reason,[Action ⑥: JavaScriptexecution】to introduce.This time " ofHow todeleteforI'm typing.

"PlaceexchangePrevioustext":
PlaceexchangeTextSpecify.このIn the column %"% andEnter.(Two single quotes.)this is,I want to delete(NothingAbsentTo the stateI want to replaceIn caseUse.このIn the columnNothingDid not enterTime,belowIn the error (Parameter 'replacement text':It cannot be empty.)

"Flow variable Replaced":
このto a variableThe deletion processCompletedText informationenter.

with thisIn text information"Included"You can delete it.

Action 5: Escape text from regular expressions

Addictive pointsIt is,There are still.Another twistIs required.In this action,JavaScriptWhen executingUnwanted special charactersReplace.
■Action "Text"⇒ 『Regular expression escape text』

Text: Regular expression escape text

"Escapetext":
EscapeTextSpecify.このIn the columnFlow variables %Replaced% andEnter.In text informationIn lettersNot yetLine breaksincludeIf the scriptそのIf the variableI can't give it to you.hereLine breakOf lettersTo newline \nSubstitution (Escape) international success.[Action: TextIn [Replace], \n toCannot be replaced.

"Flow variable EscapedText":
このto a variableThe replacement processCompletedText informationenter.

with thisIn text informationincludedLine breakOf lettersTo a new lineYou can replace it.

Action ⑥: Execute JavaScript

JavaScript, etc.ScriptTrial RunPre-processingAt lastNow that it's completed,Be carefulRewrite JavaScriptLet's do it.
■Action "Script"⇒『Running JavaScript』

Script: Execute JavaScript

"JavaScript to run":
JavaScriptSource codeSpecify.このIn the columnJavaScriptSource code andEnter.The source code isInvoiceTo the specificationsdepending onRegular expressionsPlease make it.

On PADDefinedNumeric typeVariablesJavaScripthand overTime,%hogehoge%I can give it to you,Text typeVariablesWhen handing it over,Both ends " or ' insandwichedVariable name, "%hogehoge%" or '%hogehoge%'hand overNeed Like theseDid not describeTime,belowIn the error Microsoft JScript compilation error: CharactercorrectlyThere is no text information. " butIf it isn't included, "%hogehoge%"' butIf it isn't included, '%hogehoge%'I can give it to you.

Action 4: TextReplace】Avoided,In text information " and 'BothincludeIfEitherOne sideIf you don't delete itIn the errorBecomeThe reason is,Both ends " or ' insandwichedBy variable namehand overNeedbecause there isBothIncludedTime,belowIn the error Microsoft JScript compilation error: ';'There is none. etc.)

JavaScriptInsideVariableDeclarationconductTime,var , let, constcan not use.これらをDescribedTime,belowIn the error(Microsoft JScript Compilation Error: ';' to do so.Wait)

JavaScriptExecution resultsOutputfor,WScript.Echo() and WScript.StdOut.Write()available.WScript.Echo()With line breaksIn the output,WScript.StdOut.Write()Without line breaksOutput.

"Flow variable JavascriptOutput":
このto a variableJavaScriptExecution result(Normal)enter.

"Flow variable ScriptError":
このto a variableJavaScripterror contents(In abnormal situations)enter.

Now you can use JavaScript in PADTrial Run

Action 7: Write text to a file

Then,JavaScriptExecution resultsto the fileLet's save it.
■Action "Text"Write text to a file

Text: Write text to a file

"File Path":
Save locationPath (including file name)Specify.Examples include:AWS billAWS Inc. tsvIt is specified.* Either tsv or csv formatRegular expressionsWhen usedsimplyYou can make it.

"Writetext":
I want to saveText informationSpecify.このIn the columnThe flow variables %JavascriptOutput% andEnter.※JavaScript (regular expressions)Without usingDoneIn the case of このIn the columnThe flow variables %ExtractedPDFText% andEnter.

"newLineAdd:
Text informationWritingcrowdedlaterLine breakCan I put it in?Specify.Even onEven offDo as you like.

"The fileexistcase":
ExistingContentsOverwrite it orAt the endWhether to addSpecify.This timeLoopInsideWriteBecause it is a specification,ContentsAddSelect.

with thisI want to saveText informationto the fileYou can save it.

Action 8: Move files

The converted PDFanotherIn the folderI want to move and manage it.In the case ofthisUse.
■Action "File"⇒ 『Move files』

File: Moving a File

"Movefile":
PDF PathSpecify.このIn the columnThe flow variables %CurrentItem% andEnter.

"Destination Folder":
moveFilePath (save location)Specify.Examples include:AWS billAWS Inc._ProcessedIt is specified.

"The fileexistcase":
ExistingContentsOverwrite it orDo nothingSpecify.Basically,OverwriteSelect.(As you like)

with thisThe converted PDFanotherIn the folderYou can move.

Action 9: End (Loop)

"Action 2: For each"When set,AutomaticallyIt will be added.PDF that has not been converted to textOtherIf there is, go to ②Go backLoop.All PDFsProcessing is complete,The loop (whole flow)It's finished.

Final thoughts

How was it?Were you able to convert it to text?First timeThe settings are:A littleIt may have been difficult, butこれらをIn business flowdepending onIf you customize it,MonthlyInvoice processingIt can be done with one clickthink.

That minion

That minion

So that most of the work can be done just by clicking.
We are constantly experimenting in the areas of business automation, RPA, and MA.

Use API for Free