1 Patstat beyond Europe An insight into Patstat data from patent authorities other than EPO By Gianluca Tarasconi Madrid, 9/12/2010
Jan 14, 2016
1
Patstat beyond Europe
An insight into Patstat data from patent authorities other than EPO
By Gianluca TarasconiMadrid, 9/12/2010
What is PATSTAT
PATSTAT stands for EPO Worldwide Patent Statistical Database.
Contains a snapshot of the EPO master documentation database (DOCDB) which contains data of about 90 national and international patent offices with different degree of coverage.
Data include bibliographic data, citations and family links. This database is designed to be used for statistical research and requires the data to be loaded in the customer's own database.
http://www.epo.org/patents/patent-information/raw-data/test/product-14-24.html
http://forums.epo.org/epo-patstat-faqs/
2
Non EPO data vs APE-INV Name Game
Data from other patent authorities may help in:
Validate algorithms against other spellings/conventions;
Fill missing/correct data (FI address/city) using data from equivalents;
Use Patent Family(1) data to improve algorithms using other data to give a similarity score;
(1) For a list of patent family definitions see : C. Martinez Insight into Different Types of Patent Families, STI Working Paper 2010/2
3
Example (I): inpadoc family # 75, Mr Roberts
4
PUBLN_AUTH PUBLN_NR
INVT_SEQ_NR
CTRY_CODE LAST_NAME FIRST_NAME ADDRESS CITY
BG 98254 2 GB ROBERTS, TONY G.
DK 0517145 2 GB ROBERTS, TONY GORDON
EP 0517145 1 GBRoberts, Tony Gordon, Glaxo Group Research Limited Park Road, Ware
Hertfordshire, SG12 0DG
IE 921780 2 TONY GORDON ROBERTS
RU 2102393 5 TONI GORDON ROBERTS
US 5905082 2 GB Roberts Tony Gordon Ware
WO 9221676 2 GB ROBERTS, TONY, GORDON
GLAXO GROUP RESEARCH LIMITED;PARK ROAD;WARE HERTFORDSHIRE SG12 0DG
6 different spellings for name, 3 different addresses
In this case name and city are better parsed in US equivalent patent data;
Example (II): inpadoc family # 88, Mr Newman
5
WO patent data confirm that correct address is 43111 Robbins street
US patent tells us A. stand for Antony
PUBLN_AUTH PUBLN_NR INVT_SEQ_NRCTRY_C
ODE LAST_NAME ADDRESS CITY
EP 0605442 1 US
NEWMAN, Roland, A. 43111 Robbins Street
San Diego, CA 92122
EP 0854885 2 US
NEWMAN, Roland, A. 4311 Robbins Street
San Diego, CA 92122
WO 9302108 1 US
NEWMAN, ROLAND, A.
43111 ROBBINS STREET;SAN DIEGO, CA 92122
US 6136310 2 USNewman, Roland Anthony San Diego
What countries (I)
Patstat contains 92 application authorities; 45 are inside Europe; 47 are outside Europe; Contains regional/international authorities (WIPO; ARIPO…);
Contains also ‘terminated’ authorities (DDR, URSS)
6
What countries (II)
7
1 Albania (AL) 13 China (CN) 25 Estonia (EE) 37 Hungary (HU) 49 Luxembourg (LU) 61 Nicaragua (NI) 73 Russia (RU)85 United States of America (US)
2 ARIPO (AP) 14 Costa Rica (CR) 26 Egypt (EG) 38 Indonesia (ID) 50 Latvia (LV) 62 Netherlands (NL) 74 Sweden (SE) 86 Uruguay (UY)
3 Argentina (AR)15 Czechoslovakia (CS)
27 European Patent Office (EP) 39 Ireland (IE) 51 Morocco (MA) 63 Norway (NO) 75 Singapore (SG) 87 Viet Nam (VN)
4 Austria (AT) 16 Cuba (CU) 28 Spain (ES) 40 Israel (IL) 52 Monaco (MC) 54 New Zealand (NZ) 76 Slovenia (SI)
88 World Intellectual Property Organization (WO)
5 Australia (AU) 17 Cyprus (CY) 29 Finland (FI) 41 India(IN) 53 Moldova (MD) 65 OAPI (OA) 77 Slovakia (SK)89 Former Serbia and Montenegro (YU)
6 Bosnia and Herzegovina (BA) 18 Czech republic (CZ) 30 France (FR) 42 Iceland (IS)
54 Republic of Montenegro (ME) 66 Panama (PA) 78 San Marino (SM) 90 South Africa (ZA)
7 Belgium (BE)19 German Democratic republic (DD) 31 Great Britain (GB) 43 Italy (IT)
55 Former Yugoslav Republic of Macedonia (MK) 67 Peru (PE) 79 Soviet Union (SU) 91 Zambia (ZM)
8 Bulgaria (BG) 20 Germany (DE)32 Gulf Cooperation Council (GC) 44 Japan (JP) 56 Mongolia (MN)
68 The Philippines (PH) 80 El Salvador (SV) 92 Zimbabwe (ZW)
9 Brazil (BR) 21 Denmark (DK) 33 Georgia (GE) 45 Kenya (KE) 57 Malta (MT) 69 Poland (PL) 81 Tajikistan (TJ)
10 Canada (CA) 22 Algeria (DZ) 34 Greece (GR) 46 Korea (South) (KR) 58 Malawi (MW) 70 Portugal (PT) 82 Turkey (TR)
11 Switzerland (CH) 23 Eurasia (EA)35 Hong Kong S.A.R (HK) 47 Liechtenstein (LI) 59 Mexico (MX) 71 Romania (RO) 83 Taiwan (TW)
12 Chile (CL) 24 Ecuador (EC) 36 Croatia (HR) 48 Lithuania (LT) 60 Malaysia (MY)72 Republic of Serbia (RS) 84 Ukraine (UA)
(last upd. 19.4.2010)
What dimensions are relevant
8
A) data coverage (% of coverage by year)
Are data from patent authority X 100% included into Patstat from year W to year Z ?
B) Data transmission delays
How long does it take a non EPO patent to reach in PATSTAT?
C) Completeness of geographic data
How is quality (and coverage) of address / city / country code ?
Data coverage (I)
9
EPO gives partial informations
http://www.epo.org/patents/patent-information/data-quality.html
http://www.epo.org/patents/patent-information/raw-data/useful-tables.html
Total number of applications is given but not the % of total (EPO gives what it gets)
Data coverage (II): example on India
10
CC Authority DATE NUMBERS Kind of dataDOCDB
KIND CODEKind Group
Last input week
IN India 02/08/1975 11/05/2007 137485 203704 Patent A1, E P 2005/52
In patstat are reported from EPO 66219 Indian applications
Indian Patent office reports 28.882 applications filed only for 2006
Data Transimission delays (I)
We study time series 2003- 2008 for BR, CN, JP, DE, KR and IN compared to EP;
Graph differences suggest publication lags and data transmission lags differ from country to country;
Timeseries may also highlight ‘holes’ or changes of population (FI USPTO from 2000 onward)
11
BR CN DE EP IN JP KR
2003 20878 205557 134623 137230 1047 432789 1089222004 22811 235189 111554 145312 1115 443034 1295152005 23922 287662 105002 154398 1687 447845 1605902006 13414 341493 95404 160288 1966 428966 1830372007 9197 382948 83663 160275 2195 405234 1877122008 7340 404476 73819 139610 2493 356748 175785
Data Transimission delays (II)
12
Completeness of geographic data
Table for the TOP 20 by inventor count;
13 authorities have more than 80% of records with no country code;
12 authorities have 0% of address/city;
Anyway in many cases address data are inside first name field (FI: DE)
(data from patstat 09/2009)
13
APPLNAUTH
inventors no state no zip no country no address no city
US 5960856 86% 98% 21% 97% 25%EP 3705123 100% 100% 0% 1% 1%DE 2750079 100% 100% 33% 100% 100%JP 1798271 100% 100% 98% 99% 100%CN 1537587 100% 100% 2% 100% 100%CA 1120490 100% 100% 45% 100% 100%AU 1087573 100% 100% 98% 100% 100%SU 968915 100% 100% 41% 100% 100%AT 653048 100% 100% 29% 100% 100%KR 637296 100% 100% 14% 100% 100%FR 565254 100% 100% 98% 99% 100%GB 531087 100% 100% 70% 65% 100%RU 394691 100% 100% 29% 100% 100%CH 338739 100% 100% 11% 100% 100%BR 292047 100% 100% 89% 100% 100%SE 256248 100% 100% 85% 98% 100%FI 212722 100% 100% 11% 43% 100%IT 192460 100% 100% 74% 100% 100%ES 133471 100% 100% 17% 100% 100%DD 129845 100% 100% 7% 97% 100%
Conclusions
Non EPO have coverage, quality and ‘spelling’ that may change a lot from patent authority to patent authority;
Data can be used as addictional source of information but not as main source (BONUS not MALUS);
EPO could probably improve quality of this data, especially add more addresses (FI in april 2011 will release WO address data) is up to users demand more on this topic.
14