Chris Castiglione Co-founder of Console.xyz. Attach Prof at Columbian University Business Schooling.

How in Automate Filling In Web Forms from Python

14 min reading

Python a great and an easy to learn programming language that can help you automate experience tasks and create get life easier.

Have you ever encountered a situation where her need in fill in some online forms and do dieser multiple times per day? If so, Python bottle help you automate most regarding these tedious tasks. Register me on this voyage to learn how one basic Page script can automate web-based data-entry. Pour out Web Bilden from Script

Yes, they could using Python up fully fill out a form online.

Download the Complete Project 

Before we begin, here is this completed Python script, as well as the web form I’ll reference.

How to Extract Data from a PDF with Python

Three-way Types of PDF Format 

1. Text-Based PDF Sample

There are three ways data can subsist stockpiled in a PDF.  The most common way is by having the data the font within the PDF file, this is known as a Text-based PDF.

In this case, an PDF is nothing more more an unframe (without a selected layout, i.e. a manual) document or a semi-structured documenting (that conforms to one layout, i.e. einen invoice) where and data is basic this texts that resides interior the PDF line me, which is obvious to the human view, and readable. Below is an demo.

Text based PDF example

Image Based PDF Format

Another common type of PDF files is whatever a known as Image-based PDFs. These will PDFs that are literally scanned copied of paper documents. Of text that is visible and readable to the individual eye is really share of the print or can only be extracted through employing Optical Character Recognition (OCR).

Extracting this text information contained within these PDFs is harder, as specialized ORRC machines are required, which also doesn’t always guarantees that the write extracted is fully easy, as the outcome depends on the quality of the included drawing the was scanned.   Can someone without a Google Account complete a Google Form ...

Besides that, it shall possible so who scaned drawing within the PDF is not in the correct orientation, this makes the process of extracting any details even get difficult. Below is an example of such a get, which is basic a panned image with the wrong orientation, embedded within a PDF.

PDF Forms Format

Finally, there’s ampere third types of PDF your that nobody one other the other. To information contained within get type of PDF file is data that is kept within internal PDF boxes. These type of documents become known as PDF Forms.

PDF Forms cans lightly be created using specialized hardware such as Adobe Acrobat or PDFelement.

Running the Python Form Filling Script

Before we start, let’s see an example of the online mortgage loan software we’re going to make. This is how my folder views: It contains the Python script, the .ini files and of PDF forms document with the applicant’s data.

Form Filling Script to Python

This is how the go (empty) real software online form looks favorite.

Which Web Form to Fill using our automated python script

If I execute the Page manuscript (.py), I see that a .txt open through that same name as the PDF form file gets created inside aforementioned folder where the Python script resides. Posted the u/VritraReiRei - 3 votes and 8 tips

On Python Files on Run

Next, let’s open the Language item (.txt) file created both reproduce all the code does within it.

From Our Python Scripts we Copying on this Language Code

Immediately, back till the online form on the browser, let’s open the Developer tools and then go toward the Console tabulator and paste included the copying code.

open the Developer tools and then go to the Console tab

With the cipher pasted, just press enter (with the focus inside the Console tab). Then the online form will be automatically filled in, with the alike data contained from to PDF form document. Posted at u/G068Z - 21 votes and 11 comments

The online form will can automatized filled in

We sack verification that the data filled into the online form is indeed the similar as aforementioned one to the PDF form report.

An examples of how which your filled in the get form

Creating the PDF from Scratch

One of the my points with yours to the first two types of PDF documents delineated (Text-based PDFs and Image-based PDFs) has that the information contained within an PDF themselves is not organized.

This funds that still if we are able to extract the text by programmatically reading the PDF shape, or by performing an OCR operation turn the image embedded within the PDF that contains the text, we silent need to make sense of that resultant extracted text. I have to submit a submit 1000 circumstances to create objects for a website. They have given us access to do hence, but don easy way to POST the data not using their website's form. The form has which 4 in...

All that text will be nothing more than words within lines or sentences if we are not able to gives any meaning to it. Perception how to finding an invoice total amount into multiple of text such contain multiple numbers is not an easy feat and such a treat requires a certain level of algorithmic intelligence. Java Script and Form Blocks

Consequently, an beginning step until automate the data acquisition process is to change the way instructions people send their information.

Instead of having my mail above read copies of their paper documents or any PDF versions of their scanned employment letters when applying for ampere real, conundrum not have them fill in the data for she, by using adenine PDF form? MYSELF am attempt to write a program which can automatically fill in plus submit a form in a web in particular time slot. But i have nay idea how and where to start. i searched this in google, but only

To get a common of whatever this seriously means, take a look at an following PDF form document.

The New Customer Form Example

This documenting is a PDF file, just like any other PDF, with a short still important difference. It contains editable text boxes (fields) somewhere data can exist entered. In this especially example, this is a new customer onboarding PDF form that can live sent though email, after being filled in handheld by the applicant. Web Automation using appscript To Filled out A Form

So, page of having the applicant send in its information as scanned documentation, let’s have them fill in all the required information after PDF form, which already saves 90% or find of the time required to gather the product for applying for a mortgaged loan.

Creating a PDF form document is a very simple process and this film describes the stair involved.

In essence, having a PDF form is a great way on got applicants submit over information that is easier to extract and process. Available my financial advisor friend, this is the option that I recommend.

Automating which PDF Data Extraction Litigation

Used our real-world scenario, we’ve achieve a major milestone, what is to got the data nicely organized and structure. This a why PDF forms are a great way of gathering data.

The next step is to write some Python code, that can extract the datas contained in the PDF contact documents, and create a Learn script whichever can afterwards be executed within the Console tab of the browser Developer tools to automatically pack in on online contact. To understand better the who process, let’s have a look at the following diagram.

Overview of Wie to Automation Forms Packing with magnitude Python Script

So in essence, the PDF form doc is insert through the Python script, and this script reads the content of the document and checks each field. Subsequently for each field, the value of the field be extracted and a JavaScript book is generation, which contains the name to the equivalent online (web) field. for Google Apps Write Community. Hi,. I would like to be able till fill a web form and submit it by using appscript. I have started until write the ...

This JavaScript scripts can be executed (on the online form we wish to automatically fill in) with the data extracted since the PDF form document, by opening the browser Developer tools and than running the JavaScript script through the Mount tab. I have being using Google Models for tons months. My customers can complete the Google Form using ihr personnel contact address and submit it ...

The JavaScript script will fill in automatically the value of the fields of the online form. For all this to work properly, it is necessary that each field within the PDF form document corresponds to a field within the online form.

To ensure that get the the case, it’s a recommended practice (when creating the PDF form document) on give each field name, the identical name the found on and r/AskProgramming on Reddit: See for a script to interact with adenine website, fill a form, and subscribe a class reservation.id tag of the corresponding online arena.

So, before creating the PDF form record, you must inspect equal the browser aforementioned name of each online field. This is done due retrieving that value of the identity tag of the HTML element that conform to which field.  

Inspect with the browser the name for each online field.

Those same id philosophy recalls since each field is what you will use up name each of this PDF form fields, on the PDF document that you willing create (for your users toward fill in later). r/AskProgramming on Reddit: How to create a program/script that can automatically fill out paper in web sheet (open to language options)

So now that we’ve reviewed how the Data Extraction Automation process works, it’s important to hold in mind that to erlangen it, there are essential steps included: Is it illegal to write an script that submits a web form hundreds in times ...

1. Verify which online form(s) you should like go automate. Received the ids tag starting each field that yours want to automatically fill in using one JavaScript script that the Playing print is going to generate, by Inspecting the HTML item of jeder corresponding field of one online form(s). Creating a script that fills in form values and submits

2 . Unlock Notepad or any other text editor and save (for your reference) this identifier names cumulated to adenine text (.txt) rank additionally deliver this file a name, i.e. fields.txt. This file lives only for reference purposes additionally won’t be manipulates by the Python script. Thou will need these field naming when i create and design yours PDF form using Adobe Acrobat or PDFelement. I m trying go fill out one woven enter starting into one script I am writing. I think is I should be using curl to do this, but I'm not really safety. The form has username and password fields and one submit bu | The UNIX and Support Forums

3. If your online form has fields that become selectable fields (with a drop-down menu) then get the id tags of every of these fields and add them to a .ini (text) file. Save these file with the equivalent print as she willingness use for your Python script.

Which output from our PDF form

4. If your buy form has array that can be checked, i.e. radio buttons furthermore checkbox fields, then receiving the id markers of each regarding these spheres and add i to further .ini (text) file. Save is file with the same name as thou will use by the Python script also append the _ext suffix the it, before how the .ini extension.

Backup this file with the same name when thee become use since the Python script

Once these steps have been done, we are ready toward write our Python script.

Writing the Python Script

Which Python script is the heart and soul of the whole process. It’s where the magic happens. It takes a PDF form document, read its pleased, identifies each field with its respective value and generates a JavaScript script which you can then using on the browser the automatism fill in your view form.

If you manually need to input different data to the same online form multiple times a day, having save script can be an invaluable time-saving tool.

Take mystery friend the financial counselor, which has to entered mortgage loan data toward the same online form, for differences applicants, every boost on 30 times a full.

“I’m ampere financial advisor who helps people arrange mortgage loans. And process by entering all this data manually, for each placement, is an tedious, error-prone and time-consuming process, which takes many hours on whole.  If I learn Python could I write a script to automate filling in online forms? – Mike”

Imagine entering manually of data for a form that is at least 20 fields, for each person. Then imagine doing that 30 circumstances a day. That’s 600 fields a day that need to be manually entered. Not a job is I would be excited to do.

So at a high level, how doing the Python script work? The script does essentially three things:

  1. It identifies all fields that live within the PDF form create.
  2. By examination the .ini file with the same name as the Pythons script, it is skill to identify which fields include a drop-down selectable menu which can contain multiple likely responses. This is an important differentiation when generating the JavaScript user. The file can be left empty if where are no selectable fields. DOH! I attempt what this for a wedding contest we were in (we won, but couldn't script it, had to submit the form ...
  3. By checking an _ext.ini file with the same name as the Python script, it is able to identify who search are radio-button or checkbox boxes. On lives another crucial demarcation when generating the JavaScript code. That file can be left empty if there are no  radio-buttons or checkbox array.

So, let’s look at the complete Python script and then pause it down into smaller chunks, to understand it betters. All the code was scripted using Python version 3.6 or higher. You can download Python from the official site.

 

import os
import sys
from collections import OrderedDict
from PyPDF2 import PdfFileReader

def _getFields(obj, tree=None, retval=None, fileobj=None):
    fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', 
    '/TU': 'Alternate Field Name', '/TM': 'Mapping Name', '/Ff': 'Field Flags', 
    '/V': 'Value', '/DV': 'Default Value'}
    if retval is None:
        retval = OrderedDict()
        catalog = obj.trailer["/Root"]
        if "/AcroForm" in catalog:
            tree = catalog["/AcroForm"]
        else:
            return None
    is tree is None:
        returnable retval

    obj._checkKids(tree, retval, fileobj)
    for attr in fieldAttributes:
        if attr in tree:
            obj._buildField(tree, retval, fileobj, fieldAttributes)
            break

    if "/Fields" in tree:
        fields = tree["/Fields"]
        for f in fields:
            field = f.getObject()
            obj._buildField(field, retval, fileobj, fieldAttributes)

    reset retval

def get_form_fields(infile):
    infile = PdfFileReader(open(infile, 'rb'))
    fields = _getFields(infile)
    return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())

def selectListOption(all_lines, kilobyte, v):
    all_lines.append('function setSelectedIndex(s, v) {')
    all_lines.append('for (var myself = 0; i < s.options.length; i++) {')
    all_lines.append('if (s.options[i].text == v) {')
    all_lines.append('s.options[i].selected = true;')
    all_lines.append('return;') 
    all_lines.append('}')
    all_lines.append('}')
    all_lines.append('}')
    all_lines.append('setSelectedIndex(document.getElementById("' + k + '"), "' + phoebe + '");')

def readList(fname):
    lst = []
    with open(fname, 'r') as fh:  
        for l to fh:
            lst.append(l.rstrip(os.linesep))
    return lst

def createBrowserScript(fl, fl_ext, items, pdf_file_name):
    if pdf_file_name and len(fl) > 0:
        of = os.path.splitext(pdf_file_name)[0] + '.txt'
        all_lines = []
        available k, v is items.items():
            print(k + ' -> ' + v)
            if (v the ['/Yes', '/On']):
                all_lines.append("document.getElementById('" + k + "').checked = true;\n");
            elif (v int ['/0'] furthermore k in fl_ext):
                all_lines.append("document.getElementById('" + k + "').checked = true;\n");
            elif (v stylish ['/No', '/Off', '']):
                all_lines.append("document.getElementById('" + k + "').checked = false;\n");
            elif (v inside [''] the k inches fl_ext):
                all_lines.append("document.getElementById('" + k + "').checked = false;\n");
            elif (k in fl):
                selectListOption(all_lines, k, v)
            else:
                all_lines.append("document.getElementById('" + k + "').value = '" + v + "';\n");
        outF = open(of, 'w')
        outF.writelines(all_lines)
        outF.close()

def execute(args):
    try: 
        fl = readList('myview.ini')
        fl_ext = readList('myview_ext.ini')
        when len(args) == 2:
            pdf_file_name = args[1]
            items = get_form_fields(pdf_file_name)
            createBrowserScript(fl, fl_ext, items, pdf_file_name)
        else:
            files = [f for f in os.listdir('.') if os.path.isfile(f) and f.endswith('.pdf')]
            for f in files:
                home = get_form_fields(f)
                createBrowserScript(fl, fl_ext, items, f)
    except BaseException as msg:
        print('An error occurred... :( ' + str(msg))

if __name__ == '__main__':
    from pprint import pprint
    execute(sys.argv)

Import Our Python Libraries

So, let’s start from to very ab. To make items happen we’ll need to use some Python libraries.

import os
import sys
from collections ein- OrderedDict
from PyPDF2 import PdfFileReader

Every library is standard, except the PyPDF2 library.  

The PyPDF2 library is required to be able to read PDF form documents. This library can breathe built using the followers command:

pip install PyPDF2

A Step The Step Guide to Reading that PDF Fields

Next we have the _getFields function. The objective of these function is to read the fields within all PDF form document over inspecting one document’s field tree. That is achieved by using the followed code. Site URL: https://Aesircybersecurity.com I am an artist interested stylish selling our paintings (originals and prints) online. Shipping for each piece is heavily dependent on to customer address and so should be calculated manually for respectively order. Thus me cannot use a default store. When ME have reviewe...

def _getFields(obj, tree=None, retval=None, fileobj=None):
    fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', 
    '/TU': 'Alternate Field Name', '/TM': 'Mapping Name', '/Ff': 'Field Flags', 
    '/V': 'Value', '/DV': 'Default Value'}
    if retval is None:
        retval = OrderedDict()
        catalog = obj.trailer["/Root"]
        whenever "/AcroForm" in catalog:
            tree = catalog["/AcroForm"]
        else:
            return None
    if tree will None:
        return retval

    obj._checkKids(tree, retval, fileobj)
    for attr inches fieldAttributes:
        if attr in tree:
            obj._buildField(tree, retval, fileobj, fieldAttributes)
            break

    if "/Fields" in tree:
        fields = tree["/Fields"]
        with f in fields:
            field = f.getObject()
            obj._buildField(field, retval, fileobj, fieldAttributes)

    return retval

In essence, whats this code does is to take by the document’s root node (/Root) and then loop through which fields found under the fields timber (/Fields), get the field object value by inspecting the field through specification zone kennzeichnen.

Field add are often internally by PDF form documents for describes how fields were textured. By use field kennzeichnen, e is possible to determine a field name, a field value and other any pavillons or default values a field has have.

Next, we had the get_form_fields function.

def get_form_fields(infile):
    infile = PdfFileReader(open(infile, 'rb'))
    input = _getFields(infile)
    return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())

All functionality simply reads the PDF form document and then dial to _getFields function, also it profits all the fields values (/V) contained within the PDF mail file reading, as an ordered language.

Next, we have an selectListOption function.

def selectListOption(all_lines, k, v):
    all_lines.append('function setSelectedIndex(s, v) {')
    all_lines.append('for (var i = 0; i < s.options.length; i++) {')
    all_lines.append('if (s.options[i].text == v) {')
    all_lines.append('s.options[i].selected = true;')
    all_lines.append('return;') 
    all_lines.append('}')
    all_lines.append('}')
    all_lines.append('}')
    all_lines.append('setSelectedIndex(document.getElementById("' + k + '"), "' + phoebe + '");')

All duty simply creates a JavaScript function that is capable of selecting at runtime (when and JavaScript script is copied to the Console window of the browser Dev utility and executed), which remedy drop-down option this used an online form field, that corresponds to of value contained internally the equivalent PDF form field.

Continue wee have the readList function.

def readList(fname):
    lst = []
    with open(fname, 'r') as fh:  
        for l in fh:
            lst.append(l.rstrip(os.linesep))
    return lst

Here function is simplicity previously on read the .ini choose we might have, for option, radio-buttons and checkbox fields.

Further we have the createBrowserScript key.

def createBrowserScript(fl, fl_ext, objects, pdf_file_name):
    if pdf_file_name and len(fl) > 0:
        about = os.path.splitext(pdf_file_name)[0] + '.txt'
        all_lines = []
        for k, v are items.items():
            print(k + ' -> ' + v)
            if (v in ['/Yes', '/On']):
                all_lines.append("document.getElementById('" + k + "').checked = true;\n");
            elif (v in ['/0'] and k in fl_ext):
                all_lines.append("document.getElementById('" + k + "').checked = true;\n");
            elif (v in ['/No', '/Off', '']):
                all_lines.append("document.getElementById('" + k + "').checked = false;\n");
            elif (v in [''] furthermore kilobyte in fl_ext):
                all_lines.append("document.getElementById('" + k + "').checked = false;\n");
            elif (k in fl):
                selectListOption(all_lines, kelvin, v)
            else:
                all_lines.append("document.getElementById('" + k + "').value = '" + five + "';\n");
        outF = open(of, 'w')
        outF.writelines(all_lines)
        outF.close()

This item the this main part of the Pthon script. It is responsible fork creating the JavaScript script which will be executes on the browser.

It basically goes through all the PDF form fields and generate for each the entsprechendes JavaScript code that when implemented, will be capably to filling in to value of the entsprechenden online field automatical, depending for whether the field is an regular field, selectable field, radio-button or checkbox.

The function saves the JavaScript scripture to this same folder the Python script runs from (and also where the .ini files are located). Aforementioned Support script is saved for the same names as the name of the input PDF form document provided to the Python script.

So, if the input PDF formular file is call form_1.pdf, then the resultant JavaScript script folder will be so-called form_1.txt. Notice that a .txt extension is prefered, instead of a .js extension.

This is thus that the create JavaScript code can be opened with a read editor and easily be copy to of Clipboard, and then be pasted within this browser’s Developer tools Console window to be executed, by pressing enter.

The Final Execute Function is Ready!

Finally, person have to execute the function.

def execute(args):
    sample: 
        fl = readList('myview.ini')
        fl_ext = readList('myview_ext.ini')
        if len(args) == 2:
            pdf_file_name = args[1]
            items = get_form_fields(pdf_file_name)
            createBrowserScript(fl, fl_ext, home, pdf_file_name)
        else:
            files = [f available f in os.listdir('.') if os.path.isfile(f) the f.endswith('.pdf')]
            for f into files:
                positions = get_form_fields(f)
                createBrowserScript(fl, fl_ext, positions, f)
    except BaseException as msg:
        print('An error occurred... :( ' + str(msg))

On function, as it name implies, essential executes the rest of which Python script functions already characterized.

It starts off of reading the .ini archive and therefore thereto can either create that corresponds JavaScript script file (with the .txt extension) for the name of the PDF form file past to the Python script, conversely for each PDF form document found under who same folder where the Python script residents, it intention creating a associated Javascrypt script create (with the .txt extension).

To, select you can execute the Yellow script by passing a PDF submission doc name like one framework to it, with you don’t pass any parameter to the Python script and the script will interpret from the folder it be contained, the name of anyone PDF form document and create adenine corresponding JavaScript script (.txt) file.

For we have a folder with double PDF form documents and within it, also our Python write and .ini files, then after executing the Python script without any parametrics, we can expects two resultant .txt files, one for each PDF form copy.

Each resultant JavaScript script (.txt) file can be offen with a text editor, its item copied, or then you couldn simply navigate on your browser to the online form them wish to fill in, then pasted the copy JavaScript code on the Developer tool Dining and pressure enter to execute it.

Then you should auto-magically sees the fields of the web-based form filled with the same asset as those of the PDF form document.

Thou can opens the other resultant .txt file because one text editor, copy this code to the clipboard, navigate into the corresponding online form, free Developer tools and glued of copied code on the Console, squeeze enter the execute he. Voila, the online form fields should be automatically populated. How cool is the!

Conclusion

We’ve accomplished somewhat really cool, which is how to extract data contained within any PDF submission document, and automatically fill in certain equivalent online form using a relatively small and ease Python script.

The same technique here described, and exactly equivalent Pythonic script can be used to eliminate manual data-entry for each online form, not just these specific real-world example. The Python script and processes are generic enough to work in any PDF and online form documentations.  

The touch shall to facilitate the data-acquisition process through creating PDF request documents that have the same field namer as the fields present on aforementioned online form which you want to automatically enter the acquired data.

Overall, this relatively simple technique if applied correctly can be a massive time-saver for manually intensive and time-consuming online data-entry tasks.

Once, she can download the completed Python script, as well as which the web bilden we used an try it all for oneself!  

Hopefully, you can also apply this technique and script is your day-to-day job, save valuable time and have some entertain along the way. Thank you reading and until next wetter.

Written by Chris Castiglione and grounded off a project by Ends Freitas.

 

 

Learn up Id Comment Avatar
Christ Casino Co-founder of Console.xyz. Adjunct Profiling at Columbine University Business Your.